| Literature DB >> 30257206 |
Sheng Wang1, Jeffrey D Mandell2, Yogesh Kumar3, Nawei Sun2, Montana T Morris2, Juan Arbelaez2, Cara Nasello4, Shan Dong5, Clif Duhn2, Xin Zhao6, Zhiyu Yang3, Shanmukha S Padmanabhuni3, Dongmei Yu7, Robert A King8, Andrea Dietrich9, Najah Khalifa10, Niklas Dahl11, Alden Y Huang12, Benjamin M Neale7, Giovanni Coppola12, Carol A Mathews13, Jeremiah M Scharf7, Thomas V Fernandez8, Joseph D Buxbaum14, Silvia De Rubeis14, Dorothy E Grice14, Jinchuan Xing4, Gary A Heiman4, Jay A Tischfield4, Peristera Paschou15, A Jeremy Willsey16, Matthew W State17.
Abstract
We previously established the contribution of de novo damaging sequence variants to Tourette disorder (TD) through whole-exome sequencing of 511 trios. Here, we sequence an additional 291 TD trios and analyze the combined set of 802 trios. We observe an overrepresentation of de novo damaging variants in simplex, but not multiplex, families; we identify a high-confidence TD risk gene, CELSR3 (cadherin EGF LAG seven-pass G-type receptor 3); we find that the genes mutated in TD patients are enriched for those related to cell polarity, suggesting a common pathway underlying pathobiology; and we confirm a statistically significant excess of de novo copy number variants in TD. Finally, we identify significant overlap of de novo sequence variants between TD and obsessive-compulsive disorder and de novo copy number variants between TD and autism spectrum disorder, consistent with shared genetic risk.Entities:
Keywords: TIC Genetics; Tourette disorder; cell polarity; copy number variants; de novo variants; gene discovery; microarray genotyping; multiplex; simplex; whole exome sequencing
Mesh:
Substances:
Year: 2018 PMID: 30257206 PMCID: PMC6475626 DOI: 10.1016/j.celrep.2018.08.082
Source DB: PubMed Journal: Cell Rep Impact factor: 9.995
Figure 1.Study Overview
Our group previously generated and analyzed WES data from 511 TD trios, generated by the TIC Genetics (325 trios) and TAAICG (186 trios) consortia (Willsey et al., 2017). In this study, we expand the number of trios with WES data by 291 (92 from TIC Genetics, 18 from UTC, and 181 from TSGENESEE). We leverage recurrent de novo variants occurring within the same gene in unrelated individuals to identify a high-confidence gene, CELSR3. Next, we identify de novo CNVs from the WES data and significantly associate these variants with TD. Third, we replicate the association of de novo CNVs by analysis of microarray data from 399 partially overlapping TIC Genetics trios. Finally, based on the rate of de novo variants, we assess the genomic architecture of TD. CNVs, copy number variants; SSC, Simons Simplex Collection; TAAICG, Tourette Association of America International Consortium for Genetics; TD, Tourette disorder; TIC Genetics, Tourette International Collaborative Genetics consortium; TSGENESEE, Tourette Syndrome Genetics Southern and Eastern Europe Initiative; UTC, Uppsala Tourette Cohort.
See Figure S1 for an overview of quality control and sample filtering and Table S1 for sample metrics.
Demographics and Sequencing Metrics by Cohort
| Phase | Phase 1 | Phase 2 | Phases 1 and 2 | ||||
|---|---|---|---|---|---|---|---|
| Cohort | TICGen | TAAICG (Broad) | TAAICG (UCLA) | TICGen | UTC | TSGENESEE | SSC Siblings |
| Samples (trios) sequenced | 325 | 149 | 37 | 92 | 18 | 181 | 1,184 |
| Samples (trios) passing QC for | 311 | 145 | 37 | 92 | 18 | 174 | 1,153 |
| Male:female (sex ratio) | 245:66(3.71) | 116:29(4.00) | 34:3 (11.33) | 73:19(3.84) | 14:4(3.50) | 144:30 (4.80) | 528:625 (0.84) |
| Paternal age[ | 33.05 ± 0.63 | 33.35 ± 0.85 | 31.85 ± 1.90 | 33.98 ± 1.15 | NA | NA | 32.6 ± 0.33 |
| Maternal age[ | 31.08 ± 0.57 | 31.64 ± 0.82 | 30.40 ± 1.46 | 31.16 ± 0.93 | NA | NA | 30.55 ± 0.29 |
| Simplex:multiplex[ | 264:30 | 128:13 | 35:0 | 72:0 | 17:1 | 61:59 | NA (all simplex) |
| Comorbid:non-comorbid[ | 216:86 | 101:39 | 26:10 | 64:22 | 0:18 | 84:64 | NA (all noncomorbid) |
| Exome array | Nimblegen EZ v2 | Agilent v1.1 | Nimblegen EZ v3 | IDT xGen | Nimblegen EZ v2 | ||
| Size of capture region (bp) | 44,001,748 | 32,760,120 | 63,564,965 | 33,337,769 | 44,001,748 | ||
| RefSeq hg19 coding region covered (bp) | 32,586,393 | 31,844,591 | 33,644,238 | 33,357,319 | 32,586,393 | ||
| RefSeq hg19 coding region covered (%) | 96.33 | 94.13 | 99.45 | 98.61 | 96.33 | ||
| Consensus region (bp)[ | 19,343,430 | ||||||
| Coding region covered in consensus(%) | 59.36 | 60.74 | 57.49 | 57.99 | 59.36 | ||
| Mean consensus callable size (million bp)[ | 18.97 ± 0.041 | 18.97 ± 0.059 | 18.32 ± 0.59 | 18.25 ± 0.20 | 18.87 ± 0.11 | 18.50 ± 0.0095 | 18.10 ± 0.064 |
Cohort characteristics as well as sequencing metrics are summarized per cohort and by phase. 95% confidence intervals are displayed as ±, where relevant. Agilent v1.1, Agilent SureSelect v1.1; IDTxGen, IDTxGen Exome Research Panel; Nimblegen EZ v2, Nimblegen EZ Exome v2; Nimblegen EZ v3, Nimblegen EZ Exome v3.
Not all samples have data; we based calculations on those having records (e.g., we did not have parental age records for UTC and TSGENESEE cohorts).
Simplex: parents unaffected with TD; multiplex: one or more parents have TD.
Comorbid: probands comorbid with ADHD/OCD; non-comorbid: probands not comorbid with ADHD/OCD.
We first calculated cumulative depth of coverages for each trio. For each cohort, we then generated a list of regions in which more than 50% of trios from that cohort have ≥20× joint coverage (i.e., each member of the trio has ≥20× depth at that position). We intersected these regions from each cohort to generate a list of consensus regions. To reduce any potential biases arising from differences in coverage, de novo burden analyses were restricted to these high-quality regions.
We estimated the cumulative depth of coverage for each trio in the consensus regions and calculated the mean and 95% CI using one-sample t test in R. See STAR Methods for details.
De Novo Sequence Mutation Rates by Category
| Mutation Rate per Base Pair in RefSeq Coding Regions (×10−8; ±95% CI) | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| TD (n = 777) | Controls (n = 1,153) | ||||||||||
| Cohort | Simplex (n = 577) | Multiplex (n = 103) | Unknown (n = 97) | Combined (n = 777) | Simplex Male (n = 461) | Simplex Female (n = 116) | Simplex Comorbid (n = 384) | Simplex Non-comorbid (n = 179) | Male (n = 528) | Female (n = 625) | Combined (n = 1,153) |
| Coding | 1.68 ± 0.17 | 1.58 ± 0.38 | 1.71 ± 0.44 | 1.67 ± 0.15 | 1.67 ± 0.20 | 1.69 ± 0.35 | 1.65 ± 0.21 | 1.72 ± 0.30 | 1.50 ± 0.17 | 1.55 ± 0.16 | 1.53 ± 0.12 |
| Syn | 0.42 ± 0.087 | 0.44 ± 0.23 | 0.41 ± 0.20 | 0.42 ± 0.075 | 0.42 ± 0.099 | 0.43 ± 0.18 | 0.43 ± 0.11 | 0.40 ± 0.16 | 0.39 ± 0.085 | 0.41 ± 0.085 | 0.40 ± 0.060 |
| Nonsyn | 1.25 ± 0.15 | 1.14 ± 0.35 | 1.30 ± 0.40 | 1.24 ± 0.13 | 1.25 ± 0.18 | 1.25 ± 0.32 | 1.22 ± 0.19 | 1.32 ± 0.27 | 1.10 ± 0.16 | 1.12 ± 0.14 | 1.11 ± 0.10 |
| Mis | 1.07 ± 0.14 | 1.03 ± 0.34 | 1.13 ± 0.38 | 1.08 ± 0.12 | 1.09 ± 0.16 | 1.01 ± 0.30 | 1.04 ± 0.18 | 1.13 ± 0.25 | 1.02 ± 0.15 | 1.02 ± 0.13 | 1.02 ± 0.098 |
| Mis3 | 0.60 ± 0.10 | 0.60 ± 0.23 | 0.69 ± 0.27 | 0.61 ± 0.090 | 0.62 ± 0.12 | 0.53 ± 0.20 | 0.62 ± 0.14 | 0.53 ± 0.17 | 0.53 ± 0.11 | 0.49 ± 0.093 | 0.51 ± 0.071 |
| LGD | 0.18 ± 0.057 | 0.10 ± 0.10 | 0.17 ± 0.13 | 0.17 ± 0.047 | 0.16 ± 0.062 | 0.25 ± 0.14 | 0.18 ± 0.072 | 0.19 ± 0.10 | 0.079 ± 0.039 | 0.11 ± 0.042 | 0.093 ± 0.029 |
| LGD + Mis3 | 0.78 ± 0.12 | 0.70 ± 0.24 | 0.86 ± 0.31 | 0.78 ± 0.10 | 0.78 ± 0.14 | 0.78 ± 0.22 | 0.80 ± 0.15 | 0.72 ± 0.19 | 0.61 ± 0.12 | 0.60 ± 0.11 | 0.60 ± 0.078 |
| LGD SNV | 0.073 ± 0.038 | 0.10 ± 0.10 | 0.082 ± 0.093 | 0.078 ± 0.033 | 0.057 ± 0.039 | 0.14 ± 0.11 | 0.082 ± 0.050 | 0.059 ± 0.058 | 0.027 ± 0.024 | 0.065 ± 0.033 | 0.048 ± 0.021 |
| LGD FS | 0.11 ± 0.042 | - | 0.085 ± 0.097 | 0.089 ± 0.034 | 0.10 ± 0.047 | 0.11 ± 0.098 | 0.096 ± 0.050 | 0.13 ± 0.085 | 0.051 ± 0.032 | 0.040 ± 0.026 | 0.045 ± 0.020 |
| In frame | 0.0045 ± 0.0088 | 0.026 ± 0.051 | - | 0.0067 ± 0.0093 | 0.0056 ± 0.011 | - | 0.0067 ± 0.013 | - | 0.020 ± 0.019 | 0.0042 ± 0.0082 | 0.011 ± 0.010 |
| Intolerant Mis | 0.13 ± 0.047 | 0.026 ± 0.051 | 0.19 ± 0.14 | 0.13 ± 0.040 | 0.12 ± 0.050 | 0.18 ± 0.12 | 0.12 ± 0.055 | 0.15 ± 0.090 | 0.077 ± 0.039 | 0.049 ± 0.029 | 0.062 ± 0.024 |
| Intolerant LGD | 0.069 ± 0.039 | 0.026 ± 0.051 | 0.055 ± 0.077 | 0.062 ± 0.031 | 0.064 ± 0.044 | 0.091 ± 0.089 | 0.070 ± 0.051 | 0.074 ± 0.065 | 0.020 ± 0.020 | 0.018 ± 0.017 | 0.019 ± 0.013 |
| Intolerant Nonsyn | 0.20 ± 0.063 | 0.051 ± 0.072 | 0.25 ± 0.16 | 0.19 ± 0.052 | 0.18 ± 0.068 | 0.27 ± 0.16 | 0.19 ± 0.076 | 0.22 ± 0.012 | 0.097 ± 0.043 | 0.067 ± 0.034 | 0.081 ± 0.027 |
We excluded any de novo variants located outside of the consensus regions and then calculated the mutation rate per base pair and 95% CI using t test in R. See also Figures S4 and S5. Comorbid, probands with TD and ADHD/OCD; damaging, LGD + Mis3; in frame, indel causing in-frame deletion or insertion (loss or gain of amino acids); intolerant LGD, de novo LGD variants occurring in genes with pLI greater than 0.9; intolerant Mis, de novo missense variants occurring in genes with missense Z score greater than 3.891; intolerant Nonsyn, intolerant Mis + intolerant LGD; LGD, likely gene disrupting (insertion of premature stop codon, disruption of canonical splice site, and insertion-deletion frameshift); LGD FS, insertion-deletion variant causing frameshift; LGD SNV, point mutation causing insertion of premature stop codon and disruption of canonical splice site; Mis, missense; Mis3, missense 3 (PolyPhen2 [HDIV] score R 0.957; Adzhubei et al., 2010, 2013); multiplex, one or more parents haveTD; non-comorbid, probands with TD only (without ADHD/OCD); Nonsyn, nonsynonymous; simplex, parents unaffected with TD; Syn, synonymous; unknown, phenotypic data unavailable for parents.
Figure 2.Combined Burden Analysis Identifies Differences in De Novo Rate in Simplex versus Multiplex Families
We defined a consensus region, consisting of a set of intervals with high-quality coverage across all samples. We then estimated the de novo mutation rates per base pair in this consensus region (STAR Methods). We converted the mutation rate per base pair to an expected rate per child (proband or control) by multiplying the mutation rate per base pair by the size of the total RefSeq hg19 “coding” region (33,828,798 bp).
(A) De novo variants are overrepresented in simplex TD trios only. LGD variants are significantly increased in simplex TD probands compared to SSC controls (RR 1.93; p = 0.0028; one-sided rate ratio test). Mis3 variants also trend toward enrichment (RR 1.18; p = 0.08). Therefore, de novo damaging variants as a group are overrepresented in simplex TD (RR 1.29; p = 0.0061). In contrast, de novo variants in any category are not significantly increased in multiplex TD families, though de novo damaging variants trend in that direction (RR 1.16; p = 0.26). Additionally, the rate of de novo LGD variants may be higher in simplex versus multiplex trios though the difference does not reach statistical significance (RR 1.73; p = 0.20).
(B) Restricting the analysis to de novo variants in mutation-intolerant genes (missense Z score ≥ 3.891 or pLI ≥ 0.9; Lek et al., 2016) reveals much larger effect sizes, particularly in simplex families. Comparing simplex to multiplex trios reveals significant differences for de novo nonsynonymous variants (RR 3.91; p = 0.023) and for de novo missense variants (RR 5.15; p = 0.047), but not for de novo LGD variants only (RR 2.66; p = 0.28; STAR Methods).
Damaging, LGD + Mis3; LGD, likely gene disrupting (insertion of premature stop codon, disruption of canonical splice site, and frameshift insertion-deletion variant); Mis, missense; Mis3, probably damaging missense variants (PolyPhen2 [HDIV] score ≥ 0.957; Adzhubei et al., 2010, 2013); Nonsyn, nonsynonymous; RR, rate ratio; Syn, synonymous. Error bars in (A) and (B) represent the 95% confidence interval (CI). When necessary, we truncated the lower bound of the CI to 0.
See Figures S2, S4, and S5 and Table S3.
Figure 3.De Novo CNV Burden Analysis
We called de novo CNVs from WES data and array data with CoNIFER (Krumm et al., 2012) and PennCNV (Wang et al., 2007), respectively. We utilized different methods for normalization to make the results comparable across different samples sets.
For the WES data (A), we normalized the de novo CNV rate by the number of discontinuous capture array intervals in each cohort (Figure S3A).
For the microarray data (B), we restricted de novo CNV calling to a set of SNPs shared across all arrays and further removed any outlier SNPs based on the LRR (Figure S3B; see STAR Methods for details). We compared each group with SSC sibling controls using a Wilcoxon rank-sum test in R. We also used the SSC probands as positive controls to validate our de novo calling pipelines. We used all de novo calls (confirmed and unconfirmed) in the burden analysis.
Both the WES data (A) and array data (B) demonstrate that de novo CNVs are significantly increased in TD compared to SSC controls and that de novo CNVs occur at approximately the same rate in TD and in ASD. Error bars in (A) and (B) represent the 95% confidence interval (CI). When necessary, we truncated the lower bound of the CI to 0.
See also Tables S3, S4, and S5.
Contributions of De Novo Events to TD Risk
| Percent of Children Carrying ≥1 Variant[ | Theoretical Rate per Child (±95% CI)[ | % of Cases with a Variant Mediating Risk (±95% CI)[ | % of Variants Carrying TD Risk (±95% CI)[ | |||
|---|---|---|---|---|---|---|
| TD Simplex (n = 577) | Control (n = 1,134) | TD Simplex (n = 577) | Control (n = 1,134) | |||
| LGD | 9.0% | 4.6% | 0.12 (0.082–0.16) | 0.061 (0.042–0.080) | 4.4% (1.8%–7.1%) | 49.5% (13.6%–83%) |
| Mis3 | 26.7% | 20.8% | 0.41 (0.33–0.48) | 0.34 (0.30–0.39) | 5.9% (1.6%–10.2%) | 15.3% (−5.9%–36.4%) |
| Damaging (LGD+Mis3) | 33.4% | 23.7% | 0.50 (0.42–0.57) | 0.39 (0.34–0.44) | 9.7% (5.2%–14.3%) | 22.3% (4.7%–41.5%) |
| Intolerant genes | 8.8% | 4.1% | 0.17(0.12–0.22) | 0.076 (0.055–0.098) | 4.7% (2.2%–7.1%) | 56.0% (26.0%–86.0%) |
| 2.9% | 1.4% | 1.29 × 10−7 (0.68 × 10−7–1.90 × 10−7) | 0.69 × 10−7 (0.34 × 10−7–1.05 × 10−7) | 1.5% (0.0%–3.0%) | 46.3% (−8.5%–101.1%) | |
| Damaging + | 35.5% | 25.0% | - | - | 10.5% (6.0%–15.2%) | - |
| Intolerant genes + | 11.8% | 5.6% | - | - | 6.2% (3.3%–9.2%) | - |
To estimate the contribution of de novo events to TD risk, we assessed the simplex TD and SSC controls used in both analyses of de novo sequence variants and de novo CNVs (577 TD simplex trios and 1,134 SSC sibling control trios; Table S1).
We calculated the percentage of children carrying de novo events as the total number of individuals carrying one or more de novo events/total number of individuals in the cohort; we denote the percentages of TD cases and SSC controls as p(TD) and p(Controls), respectively.
We estimated the theoretical rate per child (proband or control) for sequence variants as described in Figure 2. We obtained the mean and 95% CI by t test in R.
We estimated the percentage of cases with a variant mediating TD risk by p(TD) - p(Controls). We generated the 95% CI by bootstrapping.
We estimated the percentage of variants carrying TD risk and the corresponding 95% CI by two-sample t test in R, using the theoretical rate per child as input.
It is unclear how to estimate the theoretical de novo CNV rate per individual in WES data. We thus used the de novo CNV rate normalized by the number of continuous intervals captured to estimate the percentage of variants carrying TD risk (STAR Methods). To determine the percentage of cases with a de novo sequence variant or a de novo CNV mediating risk, we used the percent of children carrying ≥1 of any of these variants.
TD Risk Genes Identified in this Study
| Gene | LGD | Mis3 | p Value | q Value | q Value in Phase 1[ | Risk Status in Phase 1[ | Intolerant | pLI[ | Missense |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 1.93 × 10−5 | 0.069 | 0.096 | hcTD | no | 0.02 | 1.27 | |
| 0 | 3 | 2.23 × 10−5 | 0.073 | 0.14 | pTD | yes (LGD and Mis) | 1.00 | 6.17 | |
| 0 | 2 | 6.70 × 10−5 | 0.11 | 0.72 | NA | yes (LGD) | 0.99 | 1.83 | |
| 0 | 2 | 1.13 × 10−4 | 0.16 | 0.22 | pTD | yes (LGD and Mis) | 1.00 | 5.04 | |
| 0 | 2 | 1.22 × 10−4 | 0.19 | 0.26 | pTD | no | 0.06 | 1.39 | |
| 0 | 2 | 1.29 × 10−4 | 0.22 | 0.98 | NA | yes (LGD) | 1.00 | 1.22 |
Six genes with recurrent de novo variants meet our thresholds for association: two of these are high-confidence TD (hcTD) risk genes (CELSR3 and WWC1; FDR ≤ 0.1), and four of these are probable TD (pTD) risk genes (OPA1, NIPBL, FN1, and FBN2; FDR ≤ 0.3). Four of these six TD risk genes are considered intolerant to variation; determined based on PLI and missense Z score.We excluded genes with only one de novo variant from this table (3 pTD genes; see Table S7). See also Figure S6 and Table S6.
Willsey et al., (2017).
Probability of being loss-of-function (LoF) intolerant, from Exome Aggregation Consortium (ExAC). pLI R 0.9 is considered intolerant.
Z score for missense variants, from ExAC. Mis_z R 3.891 is considered intolerant.
We previously identified WWC1 as an hcTD gene and CELSR3, NIPBL, and FN1 as pTD genes (Willsey et al., 2017).
| TD probands | SSC controls | |
|---|---|---|
| # of | 337 | 350 |
| # of hits in cell polarity | 17 | 17 |
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Biological Samples | ||
| TIC Genetics trios (n = 417) | Tourette International Collaborative Genetics Study | |
| TAAICG trios (n = 186) | Tourette Association of America International Consortium for Genetics | |
| TSGENESEE trios (n = 181) | Tourette Syndrome Genetics Southern and Eastern Europe Initiative | |
| UTC trios (n = 18) | Upsala Tourette Cohort | N/A |
| Deposited Data | ||
| Whole exome sequencing data from TIC Genetics trios in Phase 1 project (n = 325) | BioProject: PRJNA384374 | |
| Whole exome sequencing data from TSAICG trios in Phase 1 project (n = 186) | BioProject: PRJNA384389 | |
| Whole exome sequencing data from TIC Genetics trios in Phase 2 project (n = 92) | This paper | BioProject: PRJNA384374 |
| Genotyping data from TIC Genetics trios (n = 412) | This paper | BioProject: PRJNA384374 |
| Whole exome sequencing data from SSC control quartets (n = 1,184) | NDAR: DOI:10.15154/1149697 | |
| Genotyping data from SSC control quartets (n = 765) | ||
| Software and Algorithms | ||
| Genome Analysis Toolkit (GATK) | ||
| BWA-mem | ||
| Picard Tools | Broad Institute. | |
| Annovar | ||
| PLINK/SEQ | ||
| Primer Design | This paper. | |
| Python script code for data processing & analysis | This paper. | |
| R code for data analysis | This paper. | |
| TADA | ||
| GenomeStudio 2.0 | Illumina | |
| Other | ||
| 1000 Genomes GRCh37 hg19 genome build | N/A | |
| RefSeq hg19 gene annotation | N/A | |
| Intervals file for NimbleGen SeqCap EZ Exome v2 | Roche NimbleGen, Madison, WI, USA | |
| Intervals file for NimbleGen SeqCap EZ Exome v3 | Agilent Technologies, Santa Clara, USA | |
| Intervals file for Agilent SureSelect v1.1 | Roche NimbleGen, Madison, WI, USA | |
| Intervals file for IDT xGen | Integrated DNA Technologies, Inc., Skokie, Illinois, USA | |
| Coding regions only from RefSeq hg19 gene annotation | This paper. |