Literature DB >> 29564678

Weighted Burden Analysis of Exome-Sequenced Case-Control Sample Implicates Synaptic Genes in Schizophrenia Aetiology.

David Curtis^1,2, Leda Coelewij³, Shou-Hwa Liu³, Jack Humphrey^3,4, Richard Mott³.

Abstract

A previous study of exome-sequenced schizophrenia cases and controls reported an excess of singleton, gene-disruptive variants among cases, concentrated in particular gene sets. The dataset included a number of subjects with a substantial Finnish contribution to ancestry. We have reanalysed the same dataset after removal of these subjects and we have also included non-singleton variants of all types using a weighted burden test which assigns higher weights to variants predicted to have a greater effect on protein function. We investigated the same 31 gene sets as previously and also 1454 GO gene sets. The reduced dataset consisted of 4225 cases and 5834 controls. No individual variants or genes were significantly enriched in cases but 13 out of the 31 gene sets were significant after Bonferroni correction and the "FMRP targets" set produced a signed log p value (SLP) of 7.1. The gene within this set with the highest SLP, equal to 3.4, was FYN, which codes for a tyrosine kinase which phosphorylates glutamate metabotropic receptors and ionotropic NMDA receptors, thus modulating their trafficking, subcellular distribution and function. In the most recent GWAS of schizophrenia it was identified as a "prioritized candidate gene". Two of the subunits of the NMDA receptor which are substrates of FYN are coded for by GRIN1 (SLP = 1.7) and GRIN2B (SLP = 2.1). Of note, for some sets there was a substantial enrichment of non-singleton variants. Of 1454 GO gene sets, three were significant after Bonferroni correction. Identifying specific genes and variants will depend on genotyping them in larger samples and/or demonstrating that they cosegregate with illness within pedigrees.

Entities: Chemical Disease Gene Mutation Species

Keywords: Exome; FYN, FMRP target; Gene; Schizophrenia; Weighted burden test

Mesh：

Year: 2018 PMID： 29564678 PMCID： PMC5934462 DOI： 10.1007/s10519-018-9893-3

Source DB: PubMed Journal: Behav Genet ISSN： 0001-8244 Impact factor: 2.805

Introduction

Schizophrenia is a severe and disabling mental illness with onset typically in early adult life. It is associated with low fecundity but nevertheless remains fairly common with a lifetime prevalence of around 1% (Power et al. 2013). A variety of types of genetic variation contribute to risk. Many common variants demonstrate association with small effect sizes whereas extremely rare variants can have very large effect sizes. 108 SNPs have been reported to be genome-wide significant with odds ratio (OR) of 1.1–1.2 and it is likely that many other variants will achieve statistical significance when larger samples are genotyped (Schizophrenia Working Group of the Psychiatric Genomics Consortium 2014). Weak effects from common variants may arise from a number of mechanisms. The variant itself may exert a direct effect at some point in the pathogenic process, it may pick up a more indirect effect through involvement in gene regulatory networks or it may be in linkage disequilibrium with other variants have a larger, direct effect (Boyle et al. 2017). A recent example of the last case is provided by SNPs in the HLA region which tag variant haplotypes of C4, the gene for complement component four, the different haplotypes producing different levels of C4A expression associated with OR for schizophrenia risk of 1.3 (Sekar et al. 2016). Variants associated with small effect sizes will be subject to relatively little selection pressure and hence can remain common. By contrast, extremely rare variants such as some copy number variants (CNVs) or loss of function (LOF) variants of SETD1A may lead to a very high risk of developing schizophrenia (Deciphering Developmental Disorders Study 2017; Rees et al. 2014; Singh et al. 2016). A proportion of cases of schizophrenia seem to be due to such variants with large effect size arising as de novo mutations (DNMs) (Fromer et al. 2014; Singh et al. 2017). Such variants are likely to be subject to strong selection pressure and may only persist for a small number of generations. Theoretically, variants acting recessively might persist in the population and still have reasonably large effect size but attempts to identify these have to date been unsuccessful (Curtis 2015; Rees et al. 2015; Ruderfer et al. 2014). In order to focus attention on only new or recent variants, the Swedish schizophrenia study of whole exome sequence data focussed on what were termed ultra-rare variants (URVs), that is variants which only occurred in a single subject and which were absent from ExAC. The effects of some of these variants on gene function were annotated as damaging or disruptive and these variants, termed dURVs, were found to be commoner in cases than controls across all genes, with the effect concentrated in particular sets of genes including FMRP targets, synaptically localised genes and genes which were LOF intolerant (Genovese et al. 2016). The present study seeks to analyse this dataset further in order to consider whether rare non-singleton sequence variants, as well as singleton variants, contribute to schizophrenia risk. The dataset used in this study consists of the largest currently available sample of exome-sequenced schizophrenia cases and controls. It overlaps with a number of previously reported analyses. The full dataset consists of 4968 cases with schizophrenia and 6245 controls. Although recruited in Sweden, it should be noted that some subjects have a substantial Finnish component to their ancestry (Genovese et al. 2016). The earlier phase of this dataset consisted of 2045 cases and 2045 controls and the primary analysis of these subjects revealed an excess among cases of very rare, disruptive mutations spread over a number of different genes though concentrated in particular gene sets (Purcell et al. 2014). This first phase of the dataset was also used for analyses which attempted to detect recessive effects and to identify Gene Ontology (GO) pathways with an excess of rare, functional variants among cases but which did not produce statistically significant results (Curtis 2013, 2016). A subset of the full dataset with cases with Finnish ancestry removed was used to demonstrate a method for deriving an exome-wide risk score and to demonstrate an association of schizophrenia with variants in mir137 binding sites (Curtis 2017; Curtis and Emmett 2017). A genetically homogeneous subset of the full Swedish dataset was combined with a UK case-control association sample and nonsynonymous variants with Minor Allele Frequency (MAF) < 0.001 which were present on the Illumina HumanExome and HumanOmniExpressExome arrays were analysed (Leonenko et al. 2017). This revealed an enrichment of these variant alleles in LOF intolerant genes and FMRP targets. The present study uses a subset of the Swedish dataset after removal of subjects with a high Finnish ancestry component in order to avoid artefactual results produced by population stratification. It also utilises all rare (MAF < 0.01) variants analysed using a weighted burden test to identify genes and sets of genes associated with schizophrenia risk.

Methods

The data analysed consisted of whole exome sequence variants downloaded from dbGaP from the Swedish schizophrenia association study containing 4968 cases and 6245 controls (Genovese et al. 2016). The dataset was managed and annotated using the GENEVARASSOC program which accompanies SCOREASSOC (https://github.com/davenomiddlenamecurtis/geneVarAssoc). Version hg19 of the reference human genome sequence and RefSeq genes were used to select variants on a gene-wise basis. Members of the protocadherin gamma gene cluster, whose transcripts overlap each other but which are entered separately in RefSeq, were treated as a single gene which was labelled PCDHG. A number of QC processes were applied. Variants were excluded if they did not have a PASS in the Variant Call Format (VCF) information field and individual genotype calls were excluded if they had a quality score less than 30. Sites were also excluded if there were more than 10% of genotypes missing or of low quality in either cases or controls or if the heterozygote count was smaller than both homozygote counts in both cohorts. As previously reported (Curtis 2017), preliminary gene-wise weighted burden tests revealed that several genes had an apparent excess of rare, protein-altering variants in cases but that these results were driven by variants which were reported in ExAC to be commoner in Finnish as opposed to non-Finnish Europeans (Lek et al. 2016). Accordingly, subjects with an excess of alleles more frequent in Finns were identified using the methods previously described (Curtis 2017) and removed from the dataset, comprising 743 cases and 411 controls. Once this had been done, leaving a sample of 4225 cases and 5834 controls, the gene-wise weighted burden test results conformed well to what would be expected under the null hypothesis with no evidence for inflation of the test statistic across the majority of genes not thought to be implicated in disease. The tests previously carried out for an excess of dURVs among cases (Genovese et al. 2016) were performed on both the full and reduced datasets, with and without including covariates consisting of the total URV count and the first 20 principal components from the SNP and indel genotypes. Weighted burden analysis of genes and gene sets as described below was carried out using SCOREASSOC, which analyses all variants simultaneously and can accord each variant a different weight according to its MAF and its predicted function (Curtis 2012, 2016). Each variant was annotated using VEP, PolyPhen and SIFT (Adzhubei et al. 2013; Kumar et al. 2009; McLaren et al. 2016). GENEVARASSOC was used to generate the input files for SCOREASSOC and the default weights were used, for example consisting of 5 for a synonymous variant and 20 for a stop gained variant, except that 10 was added to the weight if the PolyPhen annotation was possibly or probably damaging and also if the SIFT annotation was deleterious. The full set of weights is shown in Supplementary Table S1. SCOREASSOC also weights rare variants more highly than common ones but because it is well-established that no common variants have a large effect on the risk of schizophrenia we excluded variants with MAF > 0.01 in the cases and in the controls, so in practice weighting by rarity had negligible effect. For each subject a gene-wise risk score was derived as the sum of the variant-wise weights, each multiplied by the number of alleles of the variant which the given subject possessed. These scores were then compared between cases and controls using a t test. To indicate the strength of evidence in favour of an excess of rare, functional variants in cases we took the logarithm base ten of the p value from this t test and then gave it a positive sign if the average weighted sum was higher in cases and a negative sign if the average was higher in controls, to produce a signed log p (SLP). In order to explore the contribution of singleton variants, for the analyses of gene sets three sets of variants were used: singleton variants which were only observed in a single subject and not in ExAC; non-singleton variants, observed in more than one subject (though still with MAF < 0.01 in cases and/or controls); all variants, consisting of these singleton and non-singleton combined. Weighted burden analysis within sets of genes was carried out using PATHWAYASSOC, which for each subject sums up the gene-wise scores to produce an overall score for the gene set. These set-wise scores can then be compared between cases and controls using a t-test. This approach has been demonstrated to produce appropriate p values through application to real data, supported by permutation testing (Curtis 2016). This analysis was applied to the 31 gene sets used in the Swedish study separately using singleton, non-singleton and all variants. The analysis was also applied using all variants to the 1454 “all GO gene sets, gene symbols” pathways downloaded from the Molecular Signatures Database at http://www.broadinstitute.org/gsea/msigdb/collections.jsp (Subramanian et al. 2005). Logistic regression analyses of dURVs were carried out using R (R Core Team 2014). Weighted burden tests for genes and gene sets were carried out using SCOREASSOC and PATHWAYASSOC. Results from these programs are expressed as a Signed Log P (SLP) which is positive if there is an excess of variants among cases and negative if there is an excess among controls. Thus, a SLP of 3 would indicate that there was an excess of variants among cases with two-tailed significance p < 10−3.

Results

Preliminary analysis of the whole dataset, (i.e. all individuals before excluding those with Finnish ancestry), using a logistic regression analysis to test for an excess of dURVs among cases was significant (p = 8.7 × 10−10) when the total URV count and principal components were included as covariates. However without covariates this analysis was only marginally significant (p = 0.031). Further investigation showed that subjects with a substantial Finnish component to their ancestry had a larger number of URVs than those who did not. Cases tended to have a larger number of dURVs than controls, but only relative to the total number of URVs, and more cases had a substantial Finnish ancestry component than controls. Thus, in the whole sample the relative excess of dURVs among cases was almost completely masked by the fact that more cases had Finnish ancestry and that these cases had a smaller absolute number of URVs, meaning that overall there was only a small excess in the absolute number of dURVs among cases. Including the total URV count or the principal components or both as covariates allowed the relative excess among cases to become apparent. The analysis was then repeated on the reduced dataset without those subjects with a substantial Finnish ancestry component. Once this had been done, there was a significant absolute excess of dURVs among cases (p = 2.7 × 10−5), without needing to include either total URV count or principal components as covariates. The weighted burden tests evaluated 1,042,483 valid variants in 22,023 genes. As described in the section, in preliminary analyses using the full dataset a number of genes yielded high SLPs. An example was COMT, with SLP = 7.4. On inspection, it seemed that this gene-wise result was largely driven by SNP rs6267, which was heterozygous in 51/6242 controls and 94/4962 cases (OR 2.3, p = 8 × 10−7). However this variant is noted in ExAC to have MAF = 0.002 in non-Finnish Europeans but MAF = 0.05 in Finns. Hence, its increased frequency among cases appeared to be due to the excess of cases with Finnish ancestry. Once all subjects with a substantial Finnish ancestry component were excluded, the SLP for COMT fell to 1.7 and for rs6267 there were 36/5831 heterozygous controls s and 36/4221 cases (OR 1.4, p = 0.2). A similar effect was observed for other genes with excessively high SLPs in the full dataset but not in the reduced dataset, suggesting that removing subjects with substantial Finnish ancestry seemed to produce a satisfactorily homogeneous dataset. QQ-plots for the gene-wise analyses using the reduced dataset are shown in Fig. 1. All of the plots are symmetrical, indicating that the test is unbiased. When only singleton variants are used the gene-wise tests are somewhat underpowered and the gradient is less than 1. However for the tests using non-singleton variants or all variants the SLPs almost exactly follow the distribution expected under the null hypothesis. One outlier is apparent. This is caused by the gene CDCA8 which produces an SLP of − 5.49 with all variants. Further inspection showed that this result was mainly driven by 22 highly weighted variant alleles among controls but only five among cases. For a gene-wise test to be exome-wide significant with 22,023 genes the absolute value of the SLP would need to exceed 5.64, so this result is still within chance expectation.

Fig. 1

QQ plots of observed versus expected gene-wise SLP using a only singleton variants, b non-singleton variants and c both

QQ plots of observed versus expected gene-wise SLP using a only singleton variants, b non-singleton variants and c both The results for the 31 gene sets which had previously been used in the Swedish study are shown in Table 1. Using the weighted burden test many, though not all, of the sets show an excess of variants among cases. For neurons, pLI09, fmrp and mir137 the non-singleton variants make a substantial contribution but for psd, rbfox13 and rbfox2 the bulk of the effect comes from only the singleton variants. Given that there are 31 sets, a simple Bonferroni correction would mean that a set could be declared statistically significant if the SLP using all variants exceeded − log(31/0.05) = 2.8 although this threshold should be regarded as conservative because the sets overlap each other. For the 13 sets where SLP > 2.8 using all variants, the genes with the highest gene-wise SLPs are shown in Table 2. As expected, there is some overlap between the sets with several genes making contributions to more than one set. The gene with the highest gene-wise SLP in the fmrp set is FYN (SLP = 3.4) and it is also a member of 6 other sets. FYN codes for a tyrosine kinase which phosphorylates glutamate metabotropic receptors and ionotropic NMDA receptors, which modulates their trafficking, subcellular distribution and function (Mao and Wang 2016a) In the most recent GWAS of schizophrenia FYN was identified as a “prioritized candidate gene” and an intronic marker, rs7757969, was significant at p = 4.8 × 10−8 (Li et al. 2017). The activity of FYN is regulated by dopamine DRD2 receptors (Mao and Wang 2016b). FYN is involved in neuronal apoptosis, brain development and synaptic transmission and lower expression has been observed in the platelets of schizophrenic patients compared with controls (Ali and Salter 2001; Du et al. 2012; Hattori et al. 2009). Two of the subunits of the NMDA receptor which are substrates of FYN are coded for by GRIN1 (SLP = 1.7) and GRIN2B (SLP = 2.1). In all three of these genes, the signal seems to be produced from a number of highly weighted variants which are individually commoner in cases but all are very rare, with MAF < 0.001 even among cases, so it is not possible to identify any obvious candidate variants.

Table 1

Results showing SLPs obtained for the gene sets used in in the original analysis Swedish schizophrenia study (Genovese et al. 2016)

Gene set	Symbol (number of genes in set)	Singleton variants	Non-singleton variants	Both
OMIM intellectual disability (Hamosh et al. 2005)	alid (107)	0.2	1.0	1.1
Expression specific to brain (Fagerberg et al. 2014)	brain (2660)	4.0	1.3	3.1
Bound by CELF4 (Wagnon et al. 2012)	celf4 (2675)	3.1	1.7	3.7
Missense-constrained (Samocha et al. 2014)	constrained (1005)	3.8	2.0	4.8
Involved in developmental disorder (Deciphering Developmental Disorders Study 2017)	dd (93)	2.2	2.4	3.7
De novo variants in autism (Fromer et al. 2014)	denovo.aut (2927)	2.5	2.9	4.8
De novo variants in coronary heart disease (Fromer et al. 2014)	denovo.chd (249)	0.8	1.7	2.7
De novo variants in epilepsy (Fromer et al. 2014)	denovo.epi (322)	1.2	0.7	1.6
De novo duplications in ASD (Kirov et al. 2012)	denovo.gain.asd (1365)	0.9	1.2	1.8
De novo duplications in bipolar disorder (Kirov et al. 2012)	denovo.gain.bd (180)	0.8	0.5	1.2
De novo duplications in schizophrenia (Kirov et al. 2012)	denovo.gain.scz (200)	0.2	− 0.1	0.1
De novo variants in intellectual disability (Fromer et al. 2014)	denovo.id (251)	0.5	1.8	2.8
De novo deletions in ASD (Kirov et al. 2012)	denovo.loss.asd (1179)	3.1	0.2	1.3
De novo deletions in bipolar disorder (Kirov et al. 2012)	denovo.loss.bd (130)	1.4	− 0.3	0.2
De novo deletions in schizophrenia (Kirov et al. 2012)	denovo.loss.scz (246)	0.6	0.1	0.5
De novo variants in schizophrenia (Fromer et al., 2014)	denovo.scz (770)	1.7	1.3	2.3
Bound by FMRP (Darnell et al. 2011)	fmrp (1244)	7.0	3.3	7.2
Implicated by GWAS (Schizophrenia Working Group of the Psychiatric Genomics Consortium 2014)	gwas (91)	1.2	0.8	1.7
Targets of microRNA-137 (Robinson et al. 2015)	mir137 (3260)	2.5	4.1	5.3
Expression specific to neurons (Cahoy et al. 2008)	neurons (4747)	3.4	4.3	6.9
NMDAR and ARC complexes (Kirov et al. 2012)	nmdarc (80)	1.8	− 0.4	0.1
Loss-of-function intolerant (Lek et al. 2016)	pLI09 (3488)	4.2	3.3	6.2
PSD-95 (Bayés et al. 2011)	psd95 (120)	2.7	− 0.2	0.5
Bound by RBFOX 1 or 3 (Weyn-Vanhentenryck et al. 2014)	rbfox13 (3445)	5.7	1.3	4.2
Bound by RBFOX 2 (Weyn-Vanhentenryck et al. 2014)	rbfox2 (3068)	6.4	1.0	4.1
Synaptic (Pirooznia et al. 2012)	synaptome (1887)	3.9	2.2	5.4
Escape X-inactivation (Cotton et al. 2013)	x.escape (213)	0.5	0.9	1.6
X-linked intellectual disability, Genetic Services Laboratories of the University of Chicago (Gécz et al. 2009; Moeschler 2008; Moeschler et al. 2006; Rauch et al. 2006)	xlid.chicago (77)	− 0.1	1.8	1.4
X-linked intellectual disability, Greenwood Genetic Centre (Moeschler et al. 2006)	xlid.gcc (114)	− 0.2	1.8	1.3
X-linked intellectual disability, OMIM (Hamosh et al. 2005)	xlid.omim (57)	− 0.7	0.6	0.2
X-linked intellectual disability (combined)	xlid (122)	− 0.3	1.8	1.2

The lists of genes were obtained directly from the first author. The symbol used is the same as that used for the name of the file containing the list

Table 2

Gene-wise results for the genes with highest gene-wise SLPs in all sets with set-wise SLP > 2.8

brain		celf4		constrained		dd		denovo.aut		denovo.id		fmrp
DGKI	3.3	ADAMTSL1	4.3	KLHL11	3.7	GRIN2B	2.1	ADAMTSL1	4.3	ARFGEF2	2.5	FYN	3.4
SLC6A17	3.1	HPRT1	4.0	TMEM102	2.3	PACS1	2.0	TMC4	4.0	CDC42BPB	2.2	SLC6A17	3.1
AAK1	2.9	KLHL11	3.7	TIGD5	2.3	KCNQ3	1.8	OR10Z1	3.2	EPHB1	2.2	AAK1	2.9
EFNB3	2.8	PLK4	3.4	HERC1	2.3	ANKRD11	1.7	VAMP2	2.4	GRIN2B	2.1	AFF3	2.8
NDST3	2.7	DGKI	3.3	AGO3	2.2	KIF1A	1.6	FOCAD	2.4	TMPRSS12	1.8	PTK2	2.7
GLT6D1	2.6	GMCL1	3.3	DGKZ	2.2	KCNH1	1.5	C20orf96	2.3	KCNQ3	1.8	PREX2	2.5
TMEM174	2.5	CCDC112	3.1	SLIT1	2.2	DYNC1H1	1.3	HERC1	2.3	MBD5	1.7	ARFGEF2	2.5
HCRTR2	2.4	SLC6A17	3.1	DNMT3A	2.1	KAT6A	1.3	AGO3	2.2	TNK2	1.7	VAMP2	2.4
EPHA5	2.4	AAK1	2.9	KDM5C	2.1			RNF25	2.2	SETDB2	1.6	HERC1	2.3
PACSIN1	2.3	AFF3	2.8	TFAP2A	2.1			CDC42BPB	2.2	KCNH1	1.5	PACSIN1	2.3

The top ten genes are shown, providing that the gene-wise SLP was at least 1.3, equivalent to p < 0.05

Results showing SLPs obtained for the gene sets used in in the original analysis Swedish schizophrenia study (Genovese et al. 2016) The lists of genes were obtained directly from the first author. The symbol used is the same as that used for the name of the file containing the list Gene-wise results for the genes with highest gene-wise SLPs in all sets with set-wise SLP > 2.8 The top ten genes are shown, providing that the gene-wise SLP was at least 1.3, equivalent to p < 0.05 Figure 2 shows the QQ plot for the set-wise analyses using the GO gene sets. Given that there is overlap of genes between sets, the SLPs are non-independent and it is expected that the gradient of the QQ plot will be less than 1. For those sets with a negative SLP this is indeed the case and these results are in accordance with the expectation under the null hypoethesis. However the gradient becomes steeper for sets with positive SLPs and this can be interpreted as showing that some sets have an excess of variants among cases above that which would be expected by chance. Given that 1454 GO gene sets were tested, a simple Bonferroni correction would mean that a test could be declared “exome-wide significant” if it achieved an SLP exceeding − log(1454/0.05) = 4.5. Three sets did achieve this threshold. However, given the fact that the set-wise SLPs are not independent a Bonferroni correction might be viewed as conservative and Table 3 shows all sets achieving SLP > 3. The full results are presented in Supplementary Table S2. The most significant set, INTRACELLULAR_SIGNALING_CASCADE with SLP = 5.4, contains FYN and two other genes with gene-wise SLP > 3, S1PR4 (SLP = 3.7) and RTKN (SLP = 3.2). S1PR4 codes for the type 4 receptor for sphingosine-1-phosphate and the mouse strain carrying the mutation with genotype S1pr4tm1Dgen/S1pr4+ has decreased prepulse inhibition as a phenotype (http://www.informatics.jax.org/allele/genoview/MGI:3606610) (Blake et al. 2017; The Jackson Laboratory, n.d.). RTKN codes for rhotekin, a scaffold protein that interacts with GTP-bound Rho proteins. Again, inspecting results for individual variants within these genes did not reveal any obvious candidates. The full results for all genes and all gene sets can be downloaded at: http://www.davecurtis.net/downloads/SSS2WeightedBurdenAnalysisResults.tgz.

Fig. 2

QQ plot for set-wise SLPs for GO sets against the expected SLP if all sets were non-overlapping and independent

Table 3

Table showing all of the 1454 GO gene sets which produced set-wise SLP > 3

GO gene set	SLP
INTRACELLULAR_SIGNALING_CASCADE	5.39
CHROMOSOME_ORGANIZATION_AND_BIOGENESIS	4.70
ORGAN_DEVELOPMENT	4.64
SIGNAL_TRANSDUCTION	4.37
ION_BINDING	4.34
POSITIVE_REGULATION_OF_CELLULAR_PROCESS	4.22
REGULATION_OF_CELLULAR_METABOLIC_PROCESS	4.11
RHO_GUANYL_NUCLEOTIDE_EXCHANGE_FACTOR_ACTIVITY	4.11
CELL_DEVELOPMENT	4.09
CYTOPLASM	4.00
REGULATION_OF_METABOLIC_PROCESS	3.99
POSITIVE_REGULATION_OF_BIOLOGICAL_PROCESS	3.86
STRUCTURE_SPECIFIC_DNA_BINDING	3.69
PROTEIN_METABOLIC_PROCESS	3.38
TRANSMEMBRANE_RECEPTOR_ACTIVITY	3.37
SEXUAL_REPRODUCTION	3.34
FEEDING_BEHAVIOR	3.22
REGULATION_OF_PROTEIN_AMINO_ACID_PHOSPHORYLATION	3.22
NEGATIVE_REGULATION_OF_BIOLOGICAL_PROCESS	3.22
CELL_ACTIVATION	3.15
INTEGRAL_TO_MEMBRANE	3.15
REGULATION_OF_PHOSPHORYLATION	3.13
INTRINSIC_TO_MEMBRANE	3.10
GAMETE_GENERATION	3.10
REGULATION_OF_DEVELOPMENTAL_PROCESS	3.09
ESTABLISHMENT_AND_OR_MAINTENANCE_OF_CHROMATIN_ARCHITECTURE	3.09
MEMBRANE	3.06
BIOPOLYMER_METABOLIC_PROCESS	3.02
NEGATIVE_REGULATION_OF_CELLULAR_PROCESS	3.01

QQ plot for set-wise SLPs for GO sets against the expected SLP if all sets were non-overlapping and independent Table showing all of the 1454 GO gene sets which produced set-wise SLP > 3

Discussion

This analysis identifies a number of sets of genes that meet Bonferroni-corrected criteria for statistical significance. It differs from previous analyses in a number of ways. In contrast to the original analysis of the Swedish dataset (Genovese et al. 2016) it uses non-singleton as well as singleton variants and it clearly demonstrates that there is a contribution to risk from these non-singleton variants. This is extremely important in terms of the prospects for identifying rare risk variants for schizophrenia. If only unique variants conferred risk, that is only variants which occur independently as de novo mutations and then disappear after a small number of generations, then it would not be possible to identify any single variant as definitively affecting risk. One could at best identify perhaps classes of variant occurring in particular genes. Without being able to conclude that any particular variant affected risk, one could not carry out functional studies in model systems with the confidence that one was indeed studying a true risk variant. Additionally, if only unique variants contributed to risk then strategies that might use linkage disequilibrium to implicate untyped variants could not succeed. If, on the other hand, there are risk variants which survive and spread in the population then potentially these could be tagged by haplotypes of common SNPs and imputed from GWAS data, in a way similar to that used to impute C4 risk variants (Sekar et al. 2016). Alternatively population sequencing may soon become cheap and accurate enough to identify these rare variants directly. This study differs from both the Swedish study (Genovese et al. 2016) and the Swedish-UK study (Leonenko et al. 2017) in that it uses a homogeneous dataset. The original study did not exclude the subjects with a substantial Finnish ancestry component whereas the Swedish-UK study did use a homogeneous subset of the Swedish subjects but then combined them with a UK sample. This meant that both studies needed to incorporate principal components to control for population stratification and this to some extent complicates the interpretation of their results. For example, the highly significant enrichment for dURVs reported in the first study only becomes apparent when covariates are included. In the Swedish-UK study, the most highly significant variant (p = 3.4 × 10−7), which occurs in the MCPH gene, has MAF of 0.0046 in cases and of 0.0012 in controls, meaning that the unadjusted risk ratio is approximately 3.8. However after multivariate analysis including covariates the OR is reported as being only 1.2. By contrast, the reduced dataset we have used appears to be sufficiently homogeneous that the test statistic performs as expected without requiring any adjustment for population stratification. This allows for a simple, straightforward interpretation of the results obtained. Another way our analysis differs is that it includes all variants in a single analysis. Variants are assigned different weights according to an arbitrary pre-specified set of weights designed to emphasise those variants more likely to affect gene function. This meant that we carried out only a single analysis for each gene or set of genes, reducing any correction for multiple-testing. Our analyses utilised 1,042,483 variants, compared with the 112,950 used in the Swedish-UK study. Using our method, 14 of the 32 candidate gene sets and 3 of the 1454 GO sets meet formal standards for statistical significance using a conservative Bonferroni correction. As in the other studies, none of the results for individual genes reach formal standards for statistical significance, although the results obtained for FYN are possibly of interest. It seems likely that our results are detecting a real signal originating from rare variants concentrated within some of the genes that are members of the gene sets with high SLPs. These sets overlap each other to a considerable extent and it is difficult to tease out which ones best define a group of schizophrenia risk genes. An attempt to do this formally using exome-wide risk scores did not produce definitive results (Curtis 2017). It should be noted that different sets might be implicated for different reasons. For example, it may be that the high SLP for targets of miR-137 occurs because disruption of the regulation of these genes by miR-137 can lead to increased risk of schizophrenia, as supported by the association of schizophrenia with markers for miR-137 and with variants in its binding sites (Curtis and Emmett 2017; Olde Loohuis et al. 2017). On the other hand, there is no reported association of FMRP itself with schizophrenia and the high SLP for its targets may simply reflect that this identifies a group of genes whose mRNA is localised to the synapse. In any event, it is clear that with samples currently available we are only able to identify very broad gene sets but not yet specific genes. With increased sample sizes it will become possible to identify specific genes and variants which have a moderate or large effect on risk. However such variants, although not singletons, will still be very rare and serious attention should be focussed on complementary approaches to confirm them. One such approach would be to use exome sequence data from affected subjects to provide reference haplotypes for imputation into large GWAS datasets, analogously to the way C4 variants implicating risk were identified (Sekar et al. 2016). Another would be to search for affected relatives of subjects with candidate variants in order to see if the variants cosegregate with disease, a strategy which was successful in implicating RBM12 in the aetiology of psychosis (Curtis 2011; Steinberg et al. 2017). If and when specific variants are identified as having substantial effects on risk then they can be incorporated into model systems in order to gain insight into the mechanisms affecting the development of schizophrenia. Below is the link to the electronic supplementary material. Supplementary material 1 (DOCX 15 KB) Supplementary material 2 (XLSX 48 KB)

51 in total

Review 1. NMDA receptor regulation by Src kinase signalling in excitatory synaptic transmission and plasticity.

Authors: D W Ali; M W Salter
Journal: Curr Opin Neurobiol Date: 2001-06 Impact factor: 6.627

2. Assessing the contribution family data can make to case-control studies of rare variants.

Authors: David Curtis
Journal: Ann Hum Genet Date: 2011-06-16 Impact factor: 1.670

3. Genome-wide association analysis identifies 30 new susceptibility loci for schizophrenia.

Authors: Zhiqiang Li; Jianhua Chen; Hao Yu; Lin He; Yifeng Xu; Dai Zhang; Qizhong Yi; Changgui Li; Xingwang Li; Jiawei Shen; Zhijian Song; Weidong Ji; Meng Wang; Juan Zhou; Boyu Chen; Yahui Liu; Jiqiang Wang; Peng Wang; Ping Yang; Qingzhong Wang; Guoyin Feng; Benxiu Liu; Wensheng Sun; Baojie Li; Guang He; Weidong Li; Chunling Wan; Qi Xu; Wenjin Li; Zujia Wen; Ke Liu; Fang Huang; Jue Ji; Stephan Ripke; Weihua Yue; Patrick F Sullivan; Michael C O'Donovan; Yongyong Shi
Journal: Nat Genet Date: 2017-10-09 Impact factor: 38.330

4. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.

Authors: Aravind Subramanian; Pablo Tamayo; Vamsi K Mootha; Sayan Mukherjee; Benjamin L Ebert; Michael A Gillette; Amanda Paulovich; Scott L Pomeroy; Todd R Golub; Eric S Lander; Jill P Mesirov
Journal: Proc Natl Acad Sci U S A Date: 2005-09-30 Impact factor: 11.205

5. Tyrosine phosphorylation of glutamate receptors by non-receptor tyrosine kinases: roles in depression-like behavior.

Authors: Li-Min Mao; John Q Wang
Journal: Neurotransmitter (Houst) Date: 2016-12-01

6. De novo CNV analysis implicates specific abnormalities of postsynaptic signalling complexes in the pathogenesis of schizophrenia.

Authors: G Kirov; A J Pocklington; P Holmans; D Ivanov; M Ikeda; D Ruderfer; J Moran; K Chambert; D Toncheva; L Georgieva; D Grozeva; M Fjodorova; R Wollerton; E Rees; I Nikolov; L N van de Lagemaat; A Bayés; E Fernandez; P I Olason; Y Böttcher; N H Komiyama; M O Collins; J Choudhary; K Stefansson; H Stefansson; S G N Grant; S Purcell; P Sklar; M C O'Donovan; M J Owen
Journal: Mol Psychiatry Date: 2011-11-15 Impact factor: 15.992

7. Analysis of exome sequence in 604 trios for recessive genotypes in schizophrenia.

Authors: E Rees; G Kirov; J T Walters; A L Richards; D Howrigan; D H Kavanagh; A J Pocklington; M Fromer; D M Ruderfer; L Georgieva; N Carrera; P Gormley; P Palta; H Williams; S Dwyer; J S Johnson; P Roussos; D D Barker; E Banks; V Milanova; S A Rose; K Chambert; M Mahajan; E M Scolnick; J L Moran; M T Tsuang; S J Glatt; W J Chen; H-G Hwu; B M Neale; A Palotie; P Sklar; S M Purcell; S A McCarroll; P Holmans; M J Owen; M C O'Donovan
Journal: Transl Psychiatry Date: 2015-07-21 Impact factor: 6.222

8. The contribution of rare variants to risk of schizophrenia in individuals with and without intellectual disability.

Authors: Tarjinder Singh; James T R Walters; Mandy Johnstone; David Curtis; Jaana Suvisaari; Minna Torniainen; Elliott Rees; Conrad Iyegbe; Douglas Blackwood; Andrew M McIntosh; Georg Kirov; Daniel Geschwind; Robin M Murray; Marta Di Forti; Elvira Bramon; Michael Gandal; Christina M Hultman; Pamela Sklar; Aarno Palotie; Patrick F Sullivan; Michael C O'Donovan; Michael J Owen; Jeffrey C Barrett
Journal: Nat Genet Date: 2017-06-26 Impact factor: 38.330

9. Prevalence and architecture of de novo mutations in developmental disorders.

Authors:
Journal: Nature Date: 2017-01-25 Impact factor: 49.962

10. The Ensembl Variant Effect Predictor.

Authors: William McLaren; Laurent Gil; Sarah E Hunt; Harpreet Singh Riat; Graham R S Ritchie; Anja Thormann; Paul Flicek; Fiona Cunningham
Journal: Genome Biol Date: 2016-06-06 Impact factor: 13.583

8 in total

1. Assessment of Potential Clinical Role for Exome Sequencing in Schizophrenia.

Authors: Thivia Balakrishna; David Curtis
Journal: Schizophr Bull Date: 2020-02-26 Impact factor: 9.306

2. The benefit of diagnostic whole genome sequencing in schizophrenia and other psychotic disorders.

Authors: Anna Alkelai; Lior Greenbaum; Anna R Docherty; Andrey A Shabalin; Gundula Povysil; Ayan Malakar; Daniel Hughes; Shannon L Delaney; Emma P Peabody; James McNamara; Sahar Gelfman; Evan H Baugh; Anthony W Zoghbi; Matthew B Harms; Hann-Shyan Hwang; Anat Grossman-Jonish; Vimla Aggarwal; Erin L Heinzen; Vaidehi Jobanputra; Ann E Pulver; Bernard Lerer; David B Goldstein
Journal: Mol Psychiatry Date: 2021-11-19 Impact factor: 13.437

3. A general statistic to test an optimally weighted combination of common and/or rare variants.

Authors: Jianjun Zhang; Baolin Wu; Qiuying Sha; Shuanglin Zhang; Xuexia Wang
Journal: Genet Epidemiol Date: 2019-09-09 Impact factor: 2.135

4. A weighted burden test using logistic regression for integrated analysis of sequence variants, copy number variants and polygenic risk score.

Authors: David Curtis
Journal: Eur J Hum Genet Date: 2018-09-26 Impact factor: 4.246

5. Excess of singleton loss-of-function variants in Parkinson's disease contributes to genetic risk.

Authors: Dheeraj Reddy Bobbili; Peter Banda; Rejko Krüger; Patrick May
Journal: J Med Genet Date: 2020-02-13 Impact factor: 6.318

6. Targeted Sequencing of 10,198 Samples Confirms Abnormalities in Neuronal Activity and Implicates Voltage-Gated Sodium Channels in Schizophrenia Pathogenesis.

Authors: Elliott Rees; Noa Carrera; Joanne Morgan; Kirsty Hambridge; Valentina Escott-Price; Andrew J Pocklington; Alexander L Richards; Antonio F Pardiñas; Colm McDonald; Gary Donohoe; Derek W Morris; Elaine Kenny; Eric Kelleher; Michael Gill; Aiden Corvin; George Kirov; James T R Walters; Peter Holmans; Michael J Owen; Michael C O'Donovan
Journal: Biol Psychiatry Date: 2018-10-01 Impact factor: 13.382

7. Do damaging variants of SLC6A9, the gene for the glycine transporter 1 (GlyT-1), protect against schizophrenia?

Authors: David Curtis
Journal: Psychiatr Genet Date: 2020-10 Impact factor: 2.574

Review 8. Can N-Methyl-D-Aspartate Receptor Hypofunction in Schizophrenia Be Localized to an Individual Cell Type?

Authors: Alexei M Bygrave; Kasyoka Kilonzo; Dimitri M Kullmann; David M Bannerman; Dennis Kätzel
Journal: Front Psychiatry Date: 2019-11-21 Impact factor: 4.157

8 in total