Literature DB >> 25063994

Describing the genetic architecture of epilepsy through heritability analysis.

Doug Speed¹, Terence J O'Brien², Aarno Palotie³, Kirill Shkura⁴, Anthony G Marson⁵, David J Balding⁶, Michael R Johnson⁷.

Abstract

Epilepsy is a disease with substantial missing heritability; despite its high genetic component, genetic association studies have had limited success detecting common variants which influence susceptibility. In this paper, we reassess the role of common variants on epilepsy using extensions of heritability analysis. Our data set consists of 1258 UK patients with epilepsy, of which 958 have focal epilepsy, and 5129 population control subjects, with genotypes recorded for over 4 million common single nucleotide polymorphisms. Firstly, we show that on the liability scale, common variants collectively explain at least 26% (standard deviation 5%) of phenotypic variation for all epilepsy and 27% (standard deviation 5%) for focal epilepsy. Secondly we provide a new method for estimating the number of causal variants for complex traits; when applied to epilepsy, our most optimistic estimate suggests that at least 400 variants influence disease susceptibility, with potentially many thousands. Thirdly, we use bivariate analysis to assess how similar the genetic architecture of focal epilepsy is to that of non-focal epilepsy; we demonstrate both significant differences (P = 0.004) and significant similarities (P = 0.01) between the two subtypes, indicating that although the clinical definition of focal epilepsy does identify a genetically distinct epilepsy subtype, there is also scope to improve the classification of epilepsy by incorporating genotypic information. Lastly, we investigate the potential value in using genetic data to diagnose epilepsy following a single epileptic seizure; we find that a prediction model explaining 10% of phenotypic variation could have clinical utility for deciding which single-seizure individuals are likely to benefit from immediate anti-epileptic drug therapy.

Entities: Chemical

Keywords: association studies; complex trait prediction; epilepsy; heritability analysis

Mesh：

Year: 2014 PMID： 25063994 PMCID： PMC4163034 DOI： 10.1093/brain/awu206

Source DB: PubMed Journal: Brain ISSN： 0006-8950 Impact factor: 13.501

Introduction

Epilepsy is a common, serious neurological disease, defined by an enduring predisposition to epileptic seizures (Fisher ), which across North America and Europe affects approximately five people in every 1000 (Banerjee ). It is a highly heterogeneous condition that encompasses a spectrum of clinical subtypes, defined by EEG, seizure type and brain imaging criteria. Although clinical classifications are constantly evolving and remain a source of debate (Korff and Scheffer, 2013), patients with epilepsy can be divided into one of two broad categories: focal epilepsy, defined as seizures originating within one cerebral hemisphere, and non-focal epilepsy, of which the majority have generalized epilepsy and a smaller proportion are unclassifiable (Berg ). Where care has been taken with seizure classification, ∼60% of people with epilepsy are classified as focal (Banerjee ). Although traditional estimates of heritability for epilepsy vary greatly, depending on the method used, the population sampled and the mixture of clinical subtypes considered, studies have consistently demonstrated that the condition has a substantial genetic component; estimates of heritability from twin studies typically fall in the range 25–70% (Miller ; Kjeldsen ). By contrast, the molecular genetic factors affecting common forms of epilepsy remain poorly understood. The largest genome-wide association study (GWAS) to date, which considered 3445 Caucasian patients with focal epilepsy and 6935 control subjects, found no single nucleotide polymorphisms (SNPs) significantly associated with risk (Kasperavičirūt ė ). In a smaller GWAS of 1087 Chinese focal patients and 3444 controls, genome-wide significance was achieved by a single locus on 1q32.1 (minimum P = 1 × 10−8; Guo ), whereas a GWAS of 1527 European generalized patients and 2451 controls reported significant loci at 2p16.1 and 17q21.32 (minimum P = 2 × 10−9 and P = 9 × 10−9, respectively; EPICURE Consortium ). None of these three loci have yet been replicated in independent studies. For a minority of epilepsy cases, rare copy number variants have been shown to confer risk for both focal and non-focal forms of epilepsy (Dibbens ; de Kovel ; Heinzen ), whereas for uncommon, monogenic forms of epilepsy, many causal genes have been identified using linkage analysis or exome sequencing of parent-offspring trios (Heinzen ; Epi4K Consortium ; Hildebrand ); however, altogether these findings explain only a small fraction of the overall population susceptibility of epilepsy. To date, GWAS have predominantly focussed on marginal (single-SNP) analyses, where each SNP is tested individually for association with the phenotype. To allow for the large number of SNPs being tested (from a few hundred thousand to many millions), a SNP is only declared associated if its P-value is below a stringent significance threshold (typically P < 5 × 10−8). So although GWAS have successfully discovered SNPs influential for (or which tag variants which influence) a large variety of phenotypes, with standard sample sizes, they will struggle to detect SNPs of moderate or weak effect. In recent years, methods have been developed for assessing the joint influence of multiple SNPs on phenotypes. A major advantage of these methods is that they can appreciate the contribution of variants with effect sizes too small to be picked up through traditional marginal analysis. In particular, it has been shown that by applying SNP-based heritability analysis to GWAS of nominally unrelated individuals, it is possible to estimate the total variance explained by common variants (Yang , 2011; Speed ), whereas by using a bivariate extension of the method it is possible to examine the amount of overlap between the genetic architecture of two traits (Lee ). In this study, we use extensions of SNP-based heritability analysis to reconsider the impact on susceptibility to epilepsy of common variants, defined as those with minor allele frequency >0.01, and to describe the genetic architecture of the disease. Our data set consists of genome-wide SNP data for 1258 epilepsy cases and 5129 population controls. Although marginal analysis finds no individual SNPs significantly associated with susceptibility to epilepsy, we determine that collectively common SNPs explain a sizeable proportion of phenotypic variation: 26% [standard deviation (SD) 5%] when considering all epilepsy and 27% (SD 6%) when considering patients with focal epilepsy. These estimates account for inflation due to population structure and genotyping errors. By considering genome-wide distributions of heritability that are consistent with the results from our association and heritability analyses, we show that epilepsy is a highly polygenic trait with a minimum of 400 susceptibility loci, but potentially many thousands, and that the majority of heritability resides with loci which individually explain only a small fraction (<0.04%) of phenotypic variation. These results indicate that large meta-analyses, involving tens of thousands of individuals, will be required to confidently detect individual SNPs influencing susceptibility to epilepsy. One measure of the genetic similarity between two traits is ρ, the correlation between SNP effect sizes for each trait: ρ = 1 indicates the two traits have identical genetic aetiologies whereas ρ = 0 indicates no overlap. Using bivariate analysis (Lee ), we estimate ρ = 0.45 for focal and non-focal epilepsy. This result shows that there are significant differences between the genetic architectures of the two subtypes (P = 0.004 when testing the hypothesis ρ = 1), reinforcing the belief that focal and non-focal forms of epilepsy represent distinct disorders. However, it also demonstrates a significant overlap between these two subtypes of epilepsy (P = 0.01 when testing ρ = 0); this suggests there is scope to improve current clinical classifications, perhaps by incorporating genetic markers, which would facilitate the identification of subtype-specific genetic associations. Our result emphasizes the importance of considering both ‘all epilepsy’ and its individual clinical subgroups in the search for susceptibility loci. Conceptually, epilepsy is said to exist after at least one unprovoked, non-febrile seizure and when there is a high risk of recurrence (Fisher ). As ∼50–60% of individuals who experience a first, unprovoked, non-febrile epileptic seizure will never experience a second (Berg ), on a practical level the diagnosis of epilepsy has traditionally relied on the presence of at least two epileptic seizures, as then the chance of recurrence is sufficiently high (60–90%; Hauser ). Recently, the International League Against Epilepsy (ILAE) have adopted a new definition of epilepsy, such that an individual experiencing their first seizure should be considered to have epilepsy if their probability of further seizures is similar to the general recurrence risk after two seizures (Fisher ). Given the substantial estimates of variance explained, then with sufficient sample size it should be possible to construct a reasonable prediction model for epilepsy using genome-wide SNP data. Although the low prevalence of epilepsy means that this model would have limited value in terms of predicting which individuals in the general population will develop epilepsy, we examine how well they could assist the diagnosis of epilepsy for patients who have experienced a first epileptic seizure. We find that the ability of such a model to predict which single-seizure individuals will have subsequent seizures depends heavily on the distribution of liability values of individuals for whom the first seizure remains an isolated event. In the best case scenario, we determine that to achieve AUC (area under the receiver operating curve) > 0.75, which has been considered a threshold for clinical use (Janssens ), it is necessary to construct a prediction model explaining 10% of phenotypic variation.

Materials and methods

Epilepsy patients were recruited to our study by epilepsy specialists at UK epilepsy centres (Speed ). Inclusion criteria for the study were: (i) epilepsy patients aged 5 years or older; (ii) two or more unprovoked, non-febrile seizures; and (iii) able to give informed consent. Exclusion criteria were: (i) provoked seizures (e.g. alcohol); (ii) acute symptomatic seizures (e.g. acute brain injury); and (iii) progressive neurological disease (e.g. brain tumour). Patients were classified according to ILAE guidelines (Commission on Classification and Terminology of the ILAE, 1989; Berg ). Epilepsy subtype was determined by clinicians, and classified as focal, generalized or unclassified (unclassified where there was no evidence for focal onset either clinically or from neuroimaging, but where the EEG did not show a generalized epileptic discharge). A breakdown of epilepsy subtypes is provided in Supplementary Table 1. After quality control, our data comprise 1258 UK epilepsy patients (958 of subtype focal, 151 generalized and 149 unclassified), combined with 5129 controls (2655 from the 1958 Birth Cohort and 2464 from the National Blood Service; The Wellcome Trust Case Control Consortium, 2007). Before imputation, individuals were recorded for 299 735 autosomal SNPs with minor allele frequency >0.01; after imputation against the 1000 Genome Phase I June 2011 (interim) reference panel using IMPUTE2 (The 1000 Genomes Project Consortium, 2010; Howie ), this number increased to 4 238 038. Our quality control steps (detailed in full in the Appendix) sought to exclude suspect samples, poorly genotyped or imputed SNPs, and apparent population outliers. Additionally, we removed close relatedness by filtering samples so that no pair remained with estimated kinship (computed using allelic correlations; Astle and Balding, 2009) >2.6%, a level of relatedness slightly lower than that expected between second cousins (Yang ). Details of our statistical analyses are provided in Appendix 1.

Results

Marginal association analysis

Figure 1 presents P-values from marginal tests of association for susceptibility to all epilepsy (1258 cases, 5129 control subjects). We used an additive logistic regression model, including as covariates sex and the five leading principal component axes (the appropriate number of axes to include was determined through heritability analysis, see below and Supplementary Fig. 1). The genomic inflation factor is 1.05. The smallest association P-value for any SNP is P = 1.9 × 10−6, considerably above 5 × 10−8, the conventional threshold for genome-wide significance. We had 80% power to detect a variant explaining >0.46% of liability variation (see below for an explanation of the liability model) and 50% power to detect a variant explaining >0.35% (Supplementary Fig. 2) We also performed the analysis using only focal epilepsy patients (958 cases), and using only non-focal patients (300 cases consisting of patients with generalized and unclassified epilepsy combined). Again, no SNPs reached genome-wide significance (Supplementary Fig. 3); the smallest P-values were P = 1.9 × 10−6 (inflation factor 1.05) and P = 8.2 × 10−8 (inflation factor 1.01), respectively. Our results for focal epilepsy are consistent with those of Kasperavičirūt ė , who concluded there was no evidence for common SNPs affecting focal epilepsy susceptibility with odds ratios >1.3 (their study included our control samples). Notably, we found no support for rs2292096 within CAMSAP1 (in our analysis this SNP had P = 0.26, whereas the minimum P across the 178 SNPs within this gene is 0.22). The study identifying this SNP considered focal epilepsy patients of Chinese ancestry (Guo ) whereas our patients are of European ancestry. We also consider generalized epilepsy (151 cases), identifying a single locus within SYNRG (top SNP rs116499908, P = 3.3 × 10−8; inflation factor 1.01).

Figure 1

Manhattan plot for single SNP tests of association. Points report -log10 P-values from single-SNP tests of association for the phenotype all epilepsy (1258 cases, 5129 controls). Red/blue points correspond to genotyped SNPs, grey to imputed. The conventional threshold for genome-wide significance (5 × 10−8) is marked by a horizontal dashed line. Manhattan plots for the phenotypes focal, non-focal and generalized epilepsy are provided in Supplementary Fig. 3.

Variance explained by common single nucleotide polymorphisms

When considering a disease phenotype, it is convenient to suppose an underlying liability model (Supplementary Fig. 4); this assumes that case/control status is determined according to whether or not an individual’s liability, an unobservable, normally distributed random variable, lies above or below a threshold (Falconer and Mackay, 1996). On this scale, estimates of variance explained are invariant to disease prevalence and study ascertainment. To estimate , the proportion of phenotypic liability variation which can be attributed to common SNPs, we first use LDAK (Speed ) to calculate a kinship matrix based on allelic correlations across autosomes, using an additive encoding of SNPs, with values centred and scaled to have mean zero and variance one. Then, including as fixed effects sex and the top five principal component axes as used in the association analysis above, we use restricted maximum likelihood (REML) to estimate , the fraction of phenotypic variation on the observed scale (cases 1, controls 0) attributable to the kinship matrix. is then related to via where K is the disease prevalence, p is the proportion of cases in the sample, and z is the standard normal density at the liability threshold (Dempster and Lerner, 1950; Yang ) (Supplementary Fig. 4). The accuracy of this transformation has been questioned for extreme prevalences (Yang ), so to test its appropriateness for our study, we simulate phenotypes with K = 0.005 and P = 1258/6387 (the values we use when analysing the phenotype all epilepsy). We find that the resulting estimates of are on average 90% of the true values (Supplementary Fig. 5), indicating that heritability analysis will tend to moderately underestimate the total variance explained on the liability scale for low prevalence diseases. Table 1 reports our estimates of and for all epilepsy, and for the subtypes focal and non-focal (generalized and unclassified patients combined). As expected, estimates with imputed SNPs included are larger than those based only on genotyped SNPs, on average by about a quarter. The final column reports , which adjusts the corresponding estimate of for inflation due to population stratification and genotyping errors (see below). To benchmark estimates of , it is possible to estimate , the total liability heritability for each phenotype from estimates of prevalence and sibling relative risk (Falconer and Mackay, 1996; Wray ). Using the values reported by Ottman and Peljto , we estimate for all epilepsy = 32% (95% CI 24–41), for focal epilepsy = 23% (95% CI 5–43), and for non-focal epilepsy = 36% (95% CI 15–59). Despite their limited precision, the estimates of suggest that for each epilepsy phenotype, common SNPs are able to explain the majority of liability heritability, and this conclusion holds for alternative estimates of the population prevalence (Supplementary Table 2).

Table 1

Estimates of variance explained by common SNPs

Phenotype	Population	Sibling	Total liability	Genotyped SNPs		Imputed SNPs
Phenotype	Prevalence	Relative risk	Heritability, hT2	hO2	hL2	hO2	hL2	hC2
All epilepsy (1258 cases)	0.005	3.3 [2.5–4.3]	32 [24–41]	31 (6)	23 (4)	42 (6)	31 (5)	26
Focal (958 cases)	0.003	2.6 [1.2–5.3]	23 [5–43]	33 (6)	27 (5)	41 (7)	33 (5)	27
Non-focal (300 cases)	0.002	4.7 [2.1–10.8]	36 [15–59]	21 (7)	38 (12)	24 (8)	46 (14)	44

For each phenotypes, we report estimates of , the percentage of variance explained on the observed scale (cases 1, controls 0), and , the corresponding estimate on the liability scale (standard deviations provided in parentheses). is obtained from by subtracting the estimated inflation due to population stratification and genotyping errors (see text). For comparison, , the total liability heritability, is estimated for each phenotype based on the prevalence and reported estimates of sibling relative risk (95% CI shown in square brackets).

Estimates of variance explained by common SNPs For each phenotypes, we report estimates of , the percentage of variance explained on the observed scale (cases 1, controls 0), and , the corresponding estimate on the liability scale (standard deviations provided in parentheses). is obtained from by subtracting the estimated inflation due to population stratification and genotyping errors (see text). For comparison, , the total liability heritability, is estimated for each phenotype based on the prevalence and reported estimates of sibling relative risk (95% CI shown in square brackets). The epilepsy cases were genotyped using a lower coverage array than the two wild-type control data sets used for this study (Illumina 660Q as opposed to Illumina 1.2 M), so for the main analysis we imputed case and control cohorts separately. However, we additionally imputed all three cohorts together, starting with the subset of SNPs present on all three arrays. The resulting estimate for all epilepsy was = 31.7%, almost identical to our previous estimate ( = 31.5%), suggesting that with sufficient quality control, separate imputation is not a concern (and generally much faster). When computing kinships, LDAK adjusts for uneven tagging (Speed ). For comparison, we also omitted this adjustment, following instead the method of Yang . For each phenotype, the resulting estimate of based on imputed SNPs was lower than that using only genotyped SNPs (Supplementary Table 3), a paradoxical result that demonstrates the importance of adjusting for tagging when performing heritability analysis. An additional benefit of the adjustment is that for the imputed data only ∼7% of the 4.3 million SNPs receive a non-zero weighting, effecting a 14-fold data reduction and speed-up when subsequently estimating variance explained.

Inflation due to residual familial relatedness, population structure and genotyping errors

The reason that we require individuals to be distantly related is because we want to estimate the variance explained only by causal variation in linkage disequilibrium with common SNPs. By contrast, when closely related pairs of individuals are included, they will tend to share long genomic regions leading to long-range tagging, and also estimates of will possibly include contributions from shared environmental factors. Additionally, estimates will depend on the degree of relatedness between the sampled individuals, whereas with unrelated individuals estimates should be stable across populations (because linkage disequilibrium tends to be stable). We also wish to avoid population differences between cases and controls; when these are present, then variants that correlate with these differences (for example, SNPs whose allele frequencies vary between populations) will contribute towards estimates of variance explained, whether or not they tag true causal variation. We previously formalized a test to measure inflation of estimates of variance explained due to residual relatedness and population structure (Speed ). For the phenotype all epilepsy, we calculate that when including five principal component axes as covariates, ∼9% of our estimate of corresponds to inflation from these sources (i.e. in absolute terms, our estimate was inflated by ∼3%); the 9% value becomes 8% for focal epilepsy and 1% for non-focal epilepsy (Supplementary Fig. 1). These results indicate that the cases and controls are sufficiently well-matched with respect to population and that estimates of variance explained are not substantially affected by residual relatedness. Estimates of can also be inflated by genotyping errors, and strict quality control is required when analysing binary outcomes for which cases and controls have been genotyped separately (Yang ; Speed ). In Appendix 1, we derive a formula for how the heritability of a binary trait changes according to relabeling or exclusion of samples. When two or more control data sets are present, this formula allows us to estimate inflation due to genotyping errors within control individuals; we provide proof through simulation (Supplementary Fig. 6). For ‘all epilepsy’, we estimate that inflation due to genotyping errors in controls accounts for ∼8% of our estimate for (in absolute terms 2%); for focal epilepsy this figure is 11%, for non-focal epilepsy 3% (Supplementary Table 4). Although having only one case data set prevents us from measuring inflation due to genotyping errors among case individuals, given that the same quality control steps were used, we expect this to be of similar magnitude. Our final estimates of , with these three sources of inflation discounted, are = 26% for all epilepsy, = 27% for focal epilepsy and = 44% for non-focal epilepsy.

Variance explained by reported susceptibility loci

The methodology above can also be used to estimate the variance explained by a subset of SNPs, by using only these SNPs when constructing the kinship matrix. Firstly, we consider the 6003 SNPs within 500 kb of rs2292096 (located at 1q32.1), rs13026414 (2p16.1) and rs72823592 (17q21.32), which correspond to the three loci identified through previous GWAS (above). The estimate of is <0.05%, regardless of which phenotype we consider, and in all cases not significantly different from zero (Supplementary Table 5). Secondly, we identify a list of 85 genes implicated in previous epilepsy studies by searching the UniProtKB database (http://www.uniprot.org) using the keyword ‘epilepsy’. Including the 119 630 SNPs located inside or within 20 kb of the transcription start or end sites of these genes, we estimate = 3.9% (SD 1.0%) for the phenotype all epilepsy; this estimate of variance explained is both significantly greater than zero (P < 10−4) and also significantly larger than expected given the number of SNPs involved (permuted P < 0.01; Supplementary Table 5), indicating that these genes do harbour susceptibility loci, but that collectively they account for a relatively small fraction of the total heritability of epilepsy.

Estimating the number of causal variants

We explain how it is possible to gain insights into the number of causal variants underlying a phenotype by considering possible ways the total variance explained by common SNPs is distributed across the genome. By causal variant, we mean any source of genetic variation which affects the phenotype (here, epilepsy susceptibility). As we are considering variation tagged by common SNPs, we expect most of these causal variants to also be common SNPs, but this is not necessarily the case because common SNPs will to some extent tag other sources of variation, such as rare variants and copy number variations (see the ‘Discussion’ section). For Fig. 2, we base calculations on the results of our association and heritability analysis for the phenotype all epilepsy; see Supplementary Fig. 7 for focal and non-focal epilepsy. We fix the total variance explained by all causal variants at 26%. For Fig. 2A, we vary the number of causal variants (x-axis) and the distribution of heritability across these variants (colour), then calculate the probability that no single variant achieves P < 1.9 × 10−6 from single-SNP association analysis (Appendix 1). For each distribution, we record how many causal variants are required for this probability to exceed 0.5 (our point estimate) or 0.05 (our estimated lower bound). The most parsimonious scenario is that all causal variants contribute equal heritability (black line), similar to the distribution so far observed for human height and schizophrenia (Kemper ), which would suggest 870 causal variants, with a minimum of 420. If we instead suppose the distribution of heritability is uniform (red line), 1230 causal variants are required (minimum 600); if exponential (green line), the distribution considered by Goldstein (2008), 2160 are needed (minimum 1060); if ‘χ2’ (a gamma distribution with shape parameter 0.5; blue line), the distribution that applies to heritability contributions if effect sizes are Gaussian (Yang ; Speed ), the number rises to 3390 (minimum 1650).

Figure 2

Estimating the number of causal variants. We suppose heritability is distributed over causal variants either equally (black), uniformly (red), exponentially (green) or χ2 (blue). (A) As the number of causal variants increases (x-axis), the average heritability of each variant decreases, and the probability of single-SNP analysis finding no significant associations increases (y-axis). For each distribution, our point estimates (lower bounds) for the number of causal variants are the numbers required for this probability to exceed 0.5 (0.05), and are marked by vertical lines. Based on the point estimates, the histograms in B show for each distribution how much heritability each causal variant explains. The values above bars report the proportion of variance explained by causal variants within each tranche. Given the point estimates for the number of causal variants for epilepsy, the histograms in Fig. 2B show the spread of heritability for each of the four distributions. Regardless of the distribution considered, the majority of total heritability is accounted for by variants that each explain <0.04% of phenotypic variation. Table 2 shows how the expected success of an epilepsy GWAS depends on its total sample size n. For example, if heritability is distributed exponentially with 2160 causal variants, and n is increased to 20 000, 50 000 or 100 000 (maintaining a similar case-control ratio) we would expect to detect 6, 103 or 401 of the 2160 causal variants, explaining in total 0.3%, 4.3% or 12.2% of liability variation. The figures are similar regardless of the assumed distribution, and in all cases, >12 500 samples are required before we can expect to find at least one causal variant, although the predicted success can be improved by increasing the ratio of cases to controls (Supplementary Table 6).

Table 2

Expected success of single-SNP analyses

Equal, 870 Causal loci	n = 6387	n = 12 500	n = 20 000	n = 50 000	n = 100 000
Expected number of associations	0.1	0.6	3.9	147	710
% of variance explained	0.0	0.0	0.1	4.4	21.4
Uniform, 1230 Causal loci	n = 6387	n = 12 500	n = 20 000	n = 50 000	n = 100 000
Expected number of associations	0.1	0.7	4.6	142	570
% of variance explained	0.0	0.0	0.2	4.9	17.8
Exponential, 2160 Causal loci	n = 6387	n = 12 500	n = 20 000	n = 50 000	n = 100 000
Expected number of associations	0.1	0.9	5.7	103	401
% of variance explained	0.0	0.0	0.3	4.3	12.2
Chi Squared, 3390 Causal loci	n = 6387	n = 12 500	n = 20 000	n = 50 000	n = 100 000
Expected number of associations	0.1	0.9	5.9	96	360
% of variance explained	0.0	0.1	0.3	4.1	11.1

For each assumed distribution of heritability across causal variants, using the corresponding point estimates for the number of causal variants, we estimate the expected number of causal variants detected and the total proportion of liability variation these explain, for different total sample size n. We assume the case-control ratio remains fixed at 1258:5129.

Expected success of single-SNP analyses For each assumed distribution of heritability across causal variants, using the corresponding point estimates for the number of causal variants, we estimate the expected number of causal variants detected and the total proportion of liability variation these explain, for different total sample size n. We assume the case-control ratio remains fixed at 1258:5129.

Overlap between subtypes

We can measure the concordance between focal and non-focal epilepsy by ρ, the correlation between the SNP effect sizes for each phenotype; ρ close to 1 indicates that the effect sizes for focal epilepsy are very similar to those for non-focal epilepsy, and would suggest that the clinical divide of patients into focal and non-focal has very little genetic basis. Using a bivariate extension to heritability analysis (Lee ), we estimate ρ = 0.45, a value significantly lower than 1 (P = 0.007 or P = 0.00002, depending on how the control datasets are matched with focal and non-focal cases; Supplementary Table 7), indicating that patients with focal epilepsy are genetically distinct. However, our estimate of ρ is also significantly greater than 0 (P = 0.02 or P = 0.001), indicating that many causal variants are common to both subtypes. These findings suggest that clinical classifications could be improved upon and that at present when conducting an epilepsy GWAS it is advisable to analyse epilepsy subtypes both separately and together.

Prediction models based on common single nucleotide polymorphisms

We investigated the potential for predicting an individual’s risk of developing epilepsy using a linear prediction model constructed from common SNPs. This model takes the form: where Gi is the predicted risk score for Individual i, Xij is the genotypic value of Individual i for SNP j, and βj is the corresponding SNP effect size. The performance of a prediction model can be measured by , the squared correlation between each individual's predicted risk Gi and their liability value ( can be computed from , the squared correlation between Gi and case/control status, using the transformation above). For prediction models constructed from common SNPs, represents an upper bound on ; how close performance will be to this upper bound in practice will depend on how accurately the effect sizes can be estimated, which in turn depends on the available sample size, the genetic architecture of the trait, and the performance of the estimation method used. Even if we constructed a prediction model with close to , due to the low overall prevalence of epilepsy, this model would still have limited use in identifying individuals from the general population likely to develop epilepsy. For example, a prediction model with = 0.3 would have AUC 0.89 (Supplementary Fig. 8), but assuming a population prevalence of 0.005, then of the 10% (1%) of individuals with highest predicted risk, only 3.2% (9.6%) would be expected to develop epilepsy. By contrast, for individuals who experience a single, unprovoked, non-febrile seizure, the prevalence of epilepsy is much higher (40–50%; Berg ). We therefore explore how well a SNP-based prediction model can distinguish single-seizure individuals who have epilepsy (i.e. those destined to have further seizures) from those who do not have epilepsy (i.e. for whom the first seizure remains an isolated event). A major factor influencing the success of our prediction model is the distribution of liabilities of non-recurrent individuals. Figure 3A and B show the ‘Best’ and ‘Worst’ case scenarios. Prediction will be best when the liability distribution for single-seizure individuals without epilepsy matches that of the general population (i.e. that of individuals who never experience any seizures), as then the difference between the average liability values for single-seizure individuals with and without epilepsy will be greatest. Conversely, the scenario where single-seizure individuals without epilepsy have liabilities just below the case/control threshold will prove most challenging.

Figure 3

Performance of prediction models for single-seizure patients. A and B illustrate the two extreme cases for the distribution of liabilities for single-seizure individuals who do not have epilepsy. In the ‘Best Case Scenario,’ their liability distribution matches that of population controls, while in the ‘Worst Case Scenario,’ their liabilities lie just below the case/control threshold. C and D show, again for the Best and Worst Case Scenarios, how the receiver operating curve depends on the proportion of variance explained by the prediction model (varied between 5% and 30%, indicated by line colour); the AUC (area under receiver operating curve) for each line is provided in parentheses. For the best case scenario, Fig. 3C shows how prediction accuracy depends on . To achieve AUC 0.75, = 0.1 is required (green line). Given such a model, if we were to pick the 10% (20%) of individuals with highest predicted risk, we would expect 80% (75%) of these to experience a second seizure, and we would have identified ∼18% (33%) of the individuals that will subsequently develop epilepsy. However, under the worst case scenario, performance is much poorer, and even a prediction model with = 0.3 would only achieve AUC 0.60 (Supplementary Fig. 8). These results demonstrate the need to recruit to studies individuals who experience a single seizure but do not develop epilepsy, as then it will be possible to investigate which liability distribution is most appropriate, and therefore be more precise about the potential success of prediction models in diagnosing epilepsy following a single epileptic seizure.

Discussion

We have shown how knowledge of , the total proportion of liability variation explained by common SNPs, tells us a large amount about the genetic architecture of the trait under consideration. Although we have focused on epilepsy, the same techniques can be used to improve our understanding of any disease. We have estimated that is at least 26% for the phenotype all epilepsy and 27% for focal epilepsy, in both cases indicating that common SNPs explain the majority of heritability. By combining results from association and SNP-based heritability analysis, we have provided evidence that epilepsy has many hundreds of susceptibility loci, and that studies comprising tens of thousands of samples and examining the genome in higher resolution (e.g. through full sequencing or via imputation against more comprehensive reference panels) will be required before we can expect to discover a reasonable number of susceptibility loci through single-SNP analyses. Heritability analysis can also be used to investigate the genetic concordance between subtypes. We have demonstrated that focal and non-focal epilepsy have distinct genetic architectures, reinforcing the view that epilepsy is a genetically heterogeneous trait and that association analyses should take account of clinically defined subtypes; but at the same time, we also found significant overlap between the two subtypes, meaning there is room for subtype definitions to be improved. Ideally, we would have considered subtypes more specific than the two broad categories considered here, but this was not feasible with the sample size available, thus emphasizing the importance of enhanced scientific collaboration to better understand the relationships between the many clinically defined epilepsy subtypes. provides an upper bound for , the performance of linear prediction models based on common SNPs. Although for polygenic traits it is unrealistic to expect to achieve very close to (or, equivalently, close to ), with large samples sizes, it should be possible to make reasonable progress. For example, for human height, has been estimated to be 0.45 (Yang ), and = 0.36 has been achieved (Makowsky ). If, relatively speaking, we are able to do even half as well for epilepsy, the resulting prediction model would explain ∼10% of liability variation and could be used, along with clinical factors, to identify which single-seizure patients are at high risk of experiencing subsequent seizures, therefore satisfying the risk-based definition of epilepsy recently adopted by the ILAE, and are likely to benefit from immediate treatment with anti-epileptic medication. We have taken care to avoid overestimating . In particular, we recognize that even with stringent quality control, genotyping errors can inflate estimates; for this reason, we have proposed a way to assess and adjust for this effect that we recommend using alongside an existing check for inflation due to population structure and residual relatedness (Speed ). There are many factors that might lead to underestimation of variance explained. For example, for prevalences similar to that of epilepsy, we have found that the liability transformation, relied upon for converting values between the observed and liability scale, can result in underestimation of by about a tenth. A more subtle effect comes from the implicit assumption that the population controls are epilepsy-free. However, even supposing 0.5% of controls (26 individuals) actually have epilepsy, we show that our estimate of would be only 1 − (1 − 0.005)2 ≈ 1% lower than the true value (see Appendix 1 and Supplementary Fig. 9). Imputing genotypes increased our estimate of by about a quarter. However, despite the high coverage of the 1000 Genome reference panel, it remains that some causal variation will be missed or only partially tagged. Finally, it should be remembered that while we have focused on common SNPs, these will partially tag rare causal variants, and so our estimate of will include a contribution from these (Dickson ). Similarly, although we assume a linear model, we can still detect additive components of effects that are dominant or epistatic (Zuk ). However, for many applications, how much of is truly attributable to common variation, rather than rare variants and epistasis, is of little importance. For example, our ability to detect a rare causal variant through a GWAS depends not on how much variance the causal variant explains directly, but on the variance explained by the best tagging common SNP, and likewise for loci harbouring interactions. Similarly, the success of a prediction model is how much variation it explains, not the accuracy of the individual effect size estimates.

36 in total

1. Genetic and environmental factors in epilepsy: a population-based study of 11900 Danish twin pairs.

Authors: M J Kjeldsen; K O Kyvik; K Christensen; M L Friis
Journal: Epilepsy Res Date: 2001-05 Impact factor: 3.045

2. Two-stage genome-wide association study identifies variants in CAMSAP1L1 as susceptibility loci for epilepsy in Chinese.

Authors: Youling Guo; Larry W Baum; Pak Chung Sham; Virginia Wong; Ping Wing Ng; Colin Hiu Tung Lui; Ngai Chuen Sin; Tak Hong Tsoi; Clara S M Tang; Johnny S H Kwan; Benjamin H K Yip; Su-Mei Xiao; G Neil Thomas; Yu Lung Lau; Wanling Yang; Stacey S Cherny; Patrick Kwan
Journal: Hum Mol Genet Date: 2011-11-24 Impact factor: 6.150

3. Heritability of Threshold Characters.

Authors: E R Dempster; I M Lerner
Journal: Genetics Date: 1950-03 Impact factor: 4.562

4. Rare deletions at 16p13.11 predispose to a diverse spectrum of sporadic epilepsy syndromes.

Authors: Erin L Heinzen; Rodney A Radtke; Thomas J Urban; Gianpiero L Cavalleri; Chantal Depondt; Anna C Need; Nicole M Walley; Paola Nicoletti; Dongliang Ge; Claudia B Catarino; John S Duncan; Dalia Kasperaviciūte; Sarah K Tate; Luis O Caboclo; Josemir W Sander; Lisa Clayton; Kristen N Linney; Kevin V Shianna; Curtis E Gumbs; Jason Smith; Kenneth D Cronin; Jessica M Maia; Colin P Doherty; Massimo Pandolfo; David Leppert; Lefkos T Middleton; Rachel A Gibson; Michael R Johnson; Paul M Matthews; David Hosford; Reetta Kälviäinen; Kai Eriksson; Anne-Mari Kantanen; Thomas Dorn; Jörg Hansen; Günter Krämer; Bernhard J Steinhoff; Heinz-Gregor Wieser; Dominik Zumsteg; Marcos Ortega; Nicholas W Wood; Julie Huxley-Jones; Mohamad Mikati; William B Gallentine; Aatif M Husain; Patrick G Buckley; Ray L Stallings; Mihai V Podgoreanu; Norman Delanty; Sanjay M Sisodiya; David B Goldstein
Journal: Am J Hum Genet Date: 2010-04-15 Impact factor: 11.025

5. Revised terminology and concepts for organization of seizures and epilepsies: report of the ILAE Commission on Classification and Terminology, 2005-2009.

Authors: Anne T Berg; Samuel F Berkovic; Martin J Brodie; Jeffrey Buchhalter; J Helen Cross; Walter van Emde Boas; Jerome Engel; Jacqueline French; Tracy A Glauser; Gary W Mathern; Solomon L Moshé; Douglas Nordli; Perrine Plouin; Ingrid E Scheffer
Journal: Epilepsia Date: 2010-02-26 Impact factor: 5.864

6. A map of human genome variation from population-scale sequencing.

Authors: Gonçalo R Abecasis; David Altshuler; Adam Auton; Lisa D Brooks; Richard M Durbin; Richard A Gibbs; Matt E Hurles; Gil A McVean
Journal: Nature Date: 2010-10-28 Impact factor: 49.962

7. Familial risk of epilepsy: a population-based study.

Authors: Anna L Peljto; Christie Barker-Cummings; Vincent M Vasoli; Cynthia L Leibson; W Allen Hauser; Jeffrey R Buchhalter; Ruth Ottman
Journal: Brain Date: 2014-01-26 Impact factor: 13.501

Review 8. Genetic architecture of body size in mammals.

Authors: Kathryn E Kemper; Peter M Visscher; Michael E Goddard
Journal: Genome Biol Date: 2012 Impact factor: 13.583

9. Genotype imputation with thousands of genomes.

Authors: Bryan Howie; Jonathan Marchini; Matthew Stephens
Journal: G3 (Bethesda) Date: 2011-11-01 Impact factor: 3.154

10. De novo mutations in epileptic encephalopathies.

Authors: Andrew S Allen; Samuel F Berkovic; Patrick Cossette; Norman Delanty; Dennis Dlugos; Evan E Eichler; Michael P Epstein; Tracy Glauser; David B Goldstein; Yujun Han; Erin L Heinzen; Yuki Hitomi; Katherine B Howell; Michael R Johnson; Ruben Kuzniecky; Daniel H Lowenstein; Yi-Fan Lu; Maura R Z Madou; Anthony G Marson; Heather C Mefford; Sahar Esmaeeli Nieh; Terence J O'Brien; Ruth Ottman; Slavé Petrovski; Annapurna Poduri; Elizabeth K Ruzzo; Ingrid E Scheffer; Elliott H Sherr; Christopher J Yuskaitis; Bassel Abou-Khalil; Brian K Alldredge; Jocelyn F Bautista; Samuel F Berkovic; Alex Boro; Gregory D Cascino; Damian Consalvo; Patricia Crumrine; Orrin Devinsky; Dennis Dlugos; Michael P Epstein; Miguel Fiol; Nathan B Fountain; Jacqueline French; Daniel Friedman; Eric B Geller; Tracy Glauser; Simon Glynn; Sheryl R Haut; Jean Hayward; Sandra L Helmers; Sucheta Joshi; Andres Kanner; Heidi E Kirsch; Robert C Knowlton; Eric H Kossoff; Rachel Kuperman; Ruben Kuzniecky; Daniel H Lowenstein; Shannon M McGuire; Paul V Motika; Edward J Novotny; Ruth Ottman; Juliann M Paolicchi; Jack M Parent; Kristen Park; Annapurna Poduri; Ingrid E Scheffer; Renée A Shellhaas; Elliott H Sherr; Jerry J Shih; Rani Singh; Joseph Sirven; Michael C Smith; Joseph Sullivan; Liu Lin Thio; Anu Venkat; Eileen P G Vining; Gretchen K Von Allmen; Judith L Weisenberg; Peter Widdess-Walsh; Melodie R Winawer
Journal: Nature Date: 2013-08-11 Impact factor: 49.962

39 in total

Review 1. Genetic Discoveries Drive Molecular Analyses and Targeted Therapeutic Options in the Epilepsies.

Authors: Ryan S Dhindsa; David B Goldstein
Journal: Curr Neurol Neurosci Rep Date: 2015-10 Impact factor: 5.081

2. Evaluating and improving heritability models using summary statistics.

Authors: Doug Speed; John Holmes; David J Balding
Journal: Nat Genet Date: 2020-03-23 Impact factor: 38.330

3. Copy number variation analysis in 83 children with early-onset developmental and epileptic encephalopathy after targeted resequencing of a 109-epilepsy gene panel.

Authors: Kyoko Hirabayashi; Daniela Tiaki Uehara; Hidetoshi Abe; Atsushi Ishii; Keiji Moriyama; Shinichi Hirose; Johji Inazawa
Journal: J Hum Genet Date: 2019-08-30 Impact factor: 3.172

Review 4. Genetic Variation and Response to Neurocritical Illness: a Powerful Approach to Identify Novel Pathophysiological Mechanisms and Therapeutic Targets.

Authors: Julián N Acosta; Stacy C Brown; Guido J Falcone
Journal: Neurotherapeutics Date: 2020-04 Impact factor: 7.620

5. Generalized, focal, and combined epilepsies in families: New evidence for distinct genetic factors.

Authors: Colin A Ellis; Ruth Ottman; Michael P Epstein; Samuel F Berkovic
Journal: Epilepsia Date: 2020-10-23 Impact factor: 5.864

Review 6. A roadmap for precision medicine in the epilepsies.

Authors:
Journal: Lancet Neurol Date: 2015-09-20 Impact factor: 44.182

Review 7. Rare genetic causes of complex kidney and urological diseases.

Authors: Emily E Groopman; Gundula Povysil; David B Goldstein; Ali G Gharavi
Journal: Nat Rev Nephrol Date: 2020-08-17 Impact factor: 28.314

Review 8. Contemporary scope of inborn errors of metabolism involving epilepsy or seizures.

Authors: Birutė Tumienė; Borut Peterlin; Aleš Maver; Algirdas Utkus
Journal: Metab Brain Dis Date: 2018-07-13 Impact factor: 3.584

9. Burden for Parents of Patients With Schizophrenia-A Nationwide Comparative Study of Parents of Offspring With Rheumatoid Arthritis, Multiple Sclerosis, Epilepsy, and Healthy Controls.

Authors: Ellenor Mittendorfer-Rutz; Syed Rahman; Antti Tanskanen; Maila Majak; Juha Mehtälä; Fabian Hoti; Erik Jedenius; Dana Enkusson; Amy Leval; Jan Sermon; Heidi Taipale; Jari Tiihonen
Journal: Schizophr Bull Date: 2019-06-18 Impact factor: 9.306

10. Systems genetics identifies Sestrin 3 as a regulator of a proconvulsant gene network in human epileptic hippocampus.

Authors: Jacques Behmoaras; Leonardo Bottolo; Michelle L Krishnan; Katharina Pernhorst; Paola L Meza Santoscoy; Michael R Johnson; Tiziana Rossetti; Doug Speed; Prashant K Srivastava; Marc Chadeau-Hyam; Nabil Hajji; Aleksandra Dabrowska; Maxime Rotival; Banafsheh Razzaghi; Stjepana Kovac; Klaus Wanisch; Federico W Grillo; Anna Slaviero; Sarah R Langley; Kirill Shkura; Paolo Roncon; Tisham De; Manuel Mattheisen; Pitt Niehusmann; Terence J O'Brien; Slave Petrovski; Marec von Lehe; Per Hoffmann; Johan Eriksson; Alison J Coffey; Sven Cichon; Matthew Walker; Michele Simonato; Bénédicte Danis; Manuela Mazzuferi; Patrik Foerch; Susanne Schoch; Vincenzo De Paola; Rafal M Kaminski; Vincent T Cunliffe; Albert J Becker; Enrico Petretto
Journal: Nat Commun Date: 2015-01-23 Impact factor: 14.919