Karen L Oliver1, Vesna Lukic1, Saskia Freytag1, Ingrid E Scheffer1, Samuel F Berkovic1, Melanie Bahlo1. 1. Epilepsy Research Centre (K.L.O., I.E.S., S.F.B.), Department of Medicine, Austin Health, University of Melbourne, Heidelberg, Australia; Population Health and Immunity Division (V.L., S.F., M.B.), The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia; Florey Institute (I.E.S.), Melbourne, Australia; Department of Paediatrics (I.E.S.), University of Melbourne, Royal Children's Hospital, Melbourne, Australia; and Department of Mathematics and Statistics (M.B.) and Department of Medical Biology (M.B.), University of Melbourne, Australia.
Abstract
OBJECTIVE: To evaluate the performance of an in silico prioritization approach that was applied to 179 epileptic encephalopathy candidate genes in 2013 and to expand the application of this approach to the whole genome based on expression data from the Allen Human Brain Atlas. METHODS: PubMed searches determined which of the 179 epileptic encephalopathy candidate genes had been validated. For validated genes, it was noted whether they were 1 of the 19 of 179 candidates prioritized in 2013. The in silico prioritization approach was applied genome-wide; all genes were ranked according to their coexpression strength with a reference set (i.e., 51 established epileptic encephalopathy genes) in both adult and developing human brain expression data sets. Candidate genes ranked in the top 10% for both data sets were cross-referenced with genes previously implicated in the epileptic encephalopathies due to a de novo variant. RESULTS: Five of 6 validated epileptic encephalopathy candidate genes were among the 19 prioritized in 2013 (odds ratio = 54, 95% confidence interval [7,∞], p = 4.5 × 10(-5), Fisher exact test); one gene was false negative. A total of 297 genes ranked in the top 10% for both the adult and developing brain data sets based on coexpression with the reference set. Of these, 9 had been previously implicated in the epileptic encephalopathies (FBXO41, PLXNA1, ACOT4, PAK6, GABBR2, YWHAG, NBEA, KNDC1, and SELRC1). CONCLUSIONS: We conclude that brain gene coexpression data can be used to assist epileptic encephalopathy gene discovery and propose 9 genes as strong epileptic encephalopathy candidates worthy of further investigation.
OBJECTIVE: To evaluate the performance of an in silico prioritization approach that was applied to 179 epilepticencephalopathy candidate genes in 2013 and to expand the application of this approach to the whole genome based on expression data from the Allen Human Brain Atlas. METHODS: PubMed searches determined which of the 179 epilepticencephalopathy candidate genes had been validated. For validated genes, it was noted whether they were 1 of the 19 of 179 candidates prioritized in 2013. The in silico prioritization approach was applied genome-wide; all genes were ranked according to their coexpression strength with a reference set (i.e., 51 established epilepticencephalopathy genes) in both adult and developing human brain expression data sets. Candidate genes ranked in the top 10% for both data sets were cross-referenced with genes previously implicated in the epilepticencephalopathies due to a de novo variant. RESULTS: Five of 6 validated epilepticencephalopathy candidate genes were among the 19 prioritized in 2013 (odds ratio = 54, 95% confidence interval [7,∞], p = 4.5 × 10(-5), Fisher exact test); one gene was false negative. A total of 297 genes ranked in the top 10% for both the adult and developing brain data sets based on coexpression with the reference set. Of these, 9 had been previously implicated in the epilepticencephalopathies (FBXO41, PLXNA1, ACOT4, PAK6, GABBR2, YWHAG, NBEA, KNDC1, and SELRC1). CONCLUSIONS: We conclude that brain gene coexpression data can be used to assist epilepticencephalopathy gene discovery and propose 9 genes as strong epilepticencephalopathy candidates worthy of further investigation.
Currently, the genetic diagnostic yield for epilepticencephalopathies using high-throughput sequencing technologies is 25%–30%.[1] Although whole-exome sequencing has entered the clinical arena, data interpretation remains a considerable challenge for the majority of patients, who remain unsolved. When trios are studied, the presence of a de novo mutation in an established disease gene is usually diagnostic. However, the interpretation of de novo mutations in candidate genes remains difficult because healthy controls have 0–3 (median 1) de novo exonic variants.[2] There is now a growing list of candidate epilepticencephalopathy genes that harbor a plausible (e.g., novel and likely functional) de novo variant in a single patient.In 2013, the Epi4K/EPGP Consortia performed whole-exome sequencing on 264 epilepticencephalopathy trios.[3] The Consortia identified >300 de novo variants, with the majority representing “single hits” in genes not previously implicated in epilepsy. We developed and applied an in silico prioritization approach[4] to a subset of these candidate epilepticencephalopathy genes (n = 179). Those candidate genes with de novo variants deemed most likely to be pathogenic (e.g., nonsynonymous or splice-site) were chosen. Our in silico approach used data from the Allen Human Brain Atlas.[5] We prioritized 19 of 179 candidate genes in 2013 because of high brain coexpression with established epilepticencephalopathy genes, based on an empirical false discovery rate of 0.25.[4]New epilepticencephalopathy genes have since been confirmed. This provides an opportunity to validate the performance of our prioritization approach based on gene coexpression data (BrainGEP: http://bioinf.wehi.edu.au/software/BrainGEP/) and to expand its application to the wider genome.
METHODS
The original reference set of 29 established epilepticencephalopathy genes (table e-1 at Neurology.org/ng) was identified by PubMed searches using the keywords “epilepsy,” “epilepticencephalopathy,” and “genetics” in June 2013.[4] Using the same search terms, we formed an updated list of epilepticencephalopathy genes published between June 2013 and August 2015. To be established as a causal epilepticencephalopathy gene, we required that variants in the same gene and similar epilepticencephalopathy[6] clinical presentation be confidently implicated in multiple individuals.[7] To be confidently implicated, the reported variants were required to meet the American Medical Genetics Genomics guidelines for “pathogenic” or “likely pathogenic” classification (table e-2).[8]
Performance evaluation.
Newly established epilepticencephalopathy genes were cross-referenced for overlap with the list of 179 candidate genes used in our original study.[4] For those genes present in the candidate gene list, it was noted whether they were one of the 19 prioritized genes by BrainGEP, thus being validated.
Genome-wide prioritization.
The updated list of established epilepticencephalopathy genes was used to form a new reference set. This reference set (n = 51; table e-1) was used to prioritize the 13,157 and 12,365 genes represented in the adult and developing brain expression data sets, respectively, using BrainGEP. Genome-wide candidates that ranked in the top 10% for both data sets were cross-referenced to genes reported with a Sanger-validated de novo variant, typically in a single case, by the EuroEPINOMICS-RES and Epi4K Consortia.[9]
RESULTS
Since June 2013, of the 179 Epi4K/EPGP candidate genes with “single hits,” 6 have been established as epilepticencephalopathy genes: GNAO1,[10]
GRIN2B,[11]
DNM1,[9]
SLC35A2,[12]
KCNB1,[13] and GRIN1.[14] Five of the 6 now-validated candidates were prioritized in 2013, representing true positives (odds ratio = 54, 95% confidence interval [7,∞], p = 4.5 × 10−5, one-sided Fisher exact test) (table 1). SLC35A2 on the X chromosome represents the single false-negative finding. This gene ranked in the top 40% of the 179 candidate genes; genes in the top 10% were prioritized.[4]
Table 1
Summary of prioritized vs validated candidate epileptic encephalopathy genes from original study[4]
Summary of prioritized vs validated candidate epilepticencephalopathy genes from original study[4]A total of 297 genes ranked in the top 10% of genome-wide candidates based on their coexpression, in both the adult and developing human brain, with the 51 reference epilepticencephalopathy genes. Of these top-ranked genome-wide candidates (table e-3), 9 were reported by the EuroEPINOMICS-RES and Epi4K Consortia[9] and therefore have already been implicated in the epilepticencephalopathies with a de novo variant, typically in a single case (table 2).
Table 2
Nine previously implicated epileptic encephalopathy candidate genes[9] prioritized in the top 10% of the genome based on adult and developing brain gene coexpression with 51 established causative genes
Nine previously implicated epilepticencephalopathy candidate genes[9] prioritized in the top 10% of the genome based on adult and developing brain gene coexpression with 51 established causative genes
DISCUSSION
Genetic research has been revolutionized by high-throughput sequencing technology; no longer is the rate-limiting step data generation but rather the interpretation of these data. This can be particularly challenging for diseases with appreciable genetic heterogeneity, such as the epilepticencephalopathies,[15] where a common challenge is the interpretation of novel genes with a plausible de novo variant in a single case. Here we have demonstrated the merit of incorporating brain-specific gene coexpression data to add a further layer of information for or against candidates by way of in silico gene prioritization. In addition, we used this information to identify a small number of the most promising epilepticencephalopathy candidate genes from the whole genome.We systematically analyzed the performance of our in silico approach that prioritized 19 candidate epilepticencephalopathy genes as those most likely to be pathogenic from a list of 179 in 2013.[4] Since then, 6 of the 179 candidates have been confirmed as new epilepticencephalopathy genes, 5 of which had been prioritized, demonstrating noteworthy success. This reinforces the remaining 14 prioritized genes[4] as strong epilepticencephalopathy candidates; it is expected that future publications will result in a number of them being validated.The one validated epilepticencephalopathy candidate gene that was not prioritized by our approach, SLC35A2, is located on the X chromosome. Complex mechanisms of dosage compensation balance X-linked and autosomal gene expression levels; however, substantial variability can be seen between individuals and tissue types.[16] It may be that this complexity somewhat compromised the result for SLC35A2; however, IQSEC2 is also located on the X chromosome and this candidate gene was one of the 19 prioritized. IQSEC2 is a well-established intellectual disability gene, and although rare cases have been reported with seizures,[17] it did not meet our criteria for an established epilepticencephalopathy gene.Having demonstrated the validity of our approach, we applied BrainGEP to the whole genome and prioritized candidates according to their coexpression with an updated reference set of 51 established epilepticencephalopathy genes. Of the 297 top-ranked candidate genes, 9 had been previously implicated in the epilepticencephalopathies due to the presence of de novo mutation but had not been statistically confirmed.[9] The prioritization of these genes (table 2) provides an additional layer of support for their role in the pathogenesis of the epilepticencephalopathies, particularly because the prioritization is based on coexpression data from relevant tissue (i.e., brain). We suggest that these 9 candidates are those most likely to validate and thus are excellent targets for candidate gene resequencing approaches.[18] In fact, the prioritization of GABBR2 as one of the 9 candidate genes further reinforces this, as evidence for this gene is already quite strong. The EuroEPINOMICS-RES and Epi4K Consortia reported de novo mutations in GABBR2 in 2 unrelated individuals with epilepticencephalopathy.[9] However, this did not reach statistical significance, so the evidence for GABBR2 being causative was classified as only “suggestive” by the authors.[9]In silico prioritization results are predictions based on the quantitative interpretation of biological networks captured by the data; results should not be interpreted as strong or independent lines of evidence for pathogenicity. Specific limitations to the approach include the assumption that similar syndromes are caused by mutations in genes that form part of the same biological pathway(s) as established disease genes (i.e., the reference set). This means that genes representing novel biological pathways are disadvantaged, as predicted gene-gene associations with the reference set are unlikely. The ability of in silico prioritization approaches to predict these gene-gene associations is, in turn, directly related to the quality of data sources used. An advantage of our approach is that it targets the disease of interest by using gene coexpression data from the brain. However, other data sources, such as text mining and protein-protein interactions, may have detected additional gene-gene associations not captured by expression data.Despite the limitations, this work has highlighted how brain gene coexpression data can be harnessed to uncover important biological networks for the epilepticencephalopathies. This approach has the potential to frame future research strategies and therapeutic development. Our in silico prioritization work continues to evolve and now incorporates a new methodologic approach (RUVcorr) that denoises large gene expression data resources with an emphasis on extracting gene coexpression signals.[19] By using expression data from the brain, the application of this work is not limited to patients with epilepticencephalopathy but can be used to target the broader epilepsies and other neurologic diseases as well. We propose this as a valuable starting point for selecting the most promising candidate genes to target in resequencing experiments or to focus on when reanalyzing the exome data of “unsolved” patients (e.g., the Epilepsy Genetics Initiative) and when faced with a long list of novel putative causative genes.
Authors: Saadet Mercimek-Mahmutoglu; Jaina Patel; Dawn Cordeiro; Stacy Hewson; David Callen; Elizabeth J Donner; Cecil D Hahn; Peter Kannu; Jeff Kobayashi; Berge A Minassian; Mahendranath Moharir; Komudi Siriwardena; Shelly K Weiss; Rosanna Weksberg; O Carter Snead Journal: Epilepsia Date: 2015-03-25 Impact factor: 5.864
Authors: Anne T Berg; Samuel F Berkovic; Martin J Brodie; Jeffrey Buchhalter; J Helen Cross; Walter van Emde Boas; Jerome Engel; Jacqueline French; Tracy A Glauser; Gary W Mathern; Solomon L Moshé; Douglas Nordli; Perrine Plouin; Ingrid E Scheffer Journal: Epilepsia Date: 2010-02-26 Impact factor: 5.864
Authors: Ali Torkamani; Kevin Bersell; Benjamin S Jorge; Robert L Bjork; Jennifer R Friedman; Cinnamon S Bloss; Julie Cohen; Siddharth Gupta; Sakkubai Naidu; Carlos G Vanoye; Alfred L George; Jennifer A Kearney Journal: Ann Neurol Date: 2014-09-19 Impact factor: 10.422
Authors: D G MacArthur; T A Manolio; D P Dimmock; H L Rehm; J Shendure; G R Abecasis; D R Adams; R B Altman; S E Antonarakis; E A Ashley; J C Barrett; L G Biesecker; D F Conrad; G M Cooper; N J Cox; M J Daly; M B Gerstein; D B Goldstein; J N Hirschhorn; S M Leal; L A Pennacchio; J A Stamatoyannopoulos; S R Sunyaev; D Valle; B F Voight; W Winckler; C Gunter Journal: Nature Date: 2014-04-24 Impact factor: 49.962
Authors: Michael J Hawrylycz; Ed S Lein; Angela L Guillozet-Bongaarts; Elaine H Shen; Lydia Ng; Jeremy A Miller; Louie N van de Lagemaat; Kimberly A Smith; Amanda Ebbert; Zackery L Riley; Chris Abajian; Christian F Beckmann; Amy Bernard; Darren Bertagnolli; Andrew F Boe; Preston M Cartagena; M Mallar Chakravarty; Mike Chapin; Jimmy Chong; Rachel A Dalley; Barry David Daly; Chinh Dang; Suvro Datta; Nick Dee; Tim A Dolbeare; Vance Faber; David Feng; David R Fowler; Jeff Goldy; Benjamin W Gregor; Zeb Haradon; David R Haynor; John G Hohmann; Steve Horvath; Robert E Howard; Andreas Jeromin; Jayson M Jochim; Marty Kinnunen; Christopher Lau; Evan T Lazarz; Changkyu Lee; Tracy A Lemon; Ling Li; Yang Li; John A Morris; Caroline C Overly; Patrick D Parker; Sheana E Parry; Melissa Reding; Joshua J Royall; Jay Schulkin; Pedro Adolfo Sequeira; Clifford R Slaughterbeck; Simon C Smith; Andy J Sodt; Susan M Sunkin; Beryl E Swanson; Marquis P Vawter; Derric Williams; Paul Wohnoutka; H Ronald Zielke; Daniel H Geschwind; Patrick R Hof; Stephen M Smith; Christof Koch; Seth G N Grant; Allan R Jones Journal: Nature Date: 2012-09-20 Impact factor: 49.962
Authors: Andrew S Allen; Samuel F Berkovic; Patrick Cossette; Norman Delanty; Dennis Dlugos; Evan E Eichler; Michael P Epstein; Tracy Glauser; David B Goldstein; Yujun Han; Erin L Heinzen; Yuki Hitomi; Katherine B Howell; Michael R Johnson; Ruben Kuzniecky; Daniel H Lowenstein; Yi-Fan Lu; Maura R Z Madou; Anthony G Marson; Heather C Mefford; Sahar Esmaeeli Nieh; Terence J O'Brien; Ruth Ottman; Slavé Petrovski; Annapurna Poduri; Elizabeth K Ruzzo; Ingrid E Scheffer; Elliott H Sherr; Christopher J Yuskaitis; Bassel Abou-Khalil; Brian K Alldredge; Jocelyn F Bautista; Samuel F Berkovic; Alex Boro; Gregory D Cascino; Damian Consalvo; Patricia Crumrine; Orrin Devinsky; Dennis Dlugos; Michael P Epstein; Miguel Fiol; Nathan B Fountain; Jacqueline French; Daniel Friedman; Eric B Geller; Tracy Glauser; Simon Glynn; Sheryl R Haut; Jean Hayward; Sandra L Helmers; Sucheta Joshi; Andres Kanner; Heidi E Kirsch; Robert C Knowlton; Eric H Kossoff; Rachel Kuperman; Ruben Kuzniecky; Daniel H Lowenstein; Shannon M McGuire; Paul V Motika; Edward J Novotny; Ruth Ottman; Juliann M Paolicchi; Jack M Parent; Kristen Park; Annapurna Poduri; Ingrid E Scheffer; Renée A Shellhaas; Elliott H Sherr; Jerry J Shih; Rani Singh; Joseph Sirven; Michael C Smith; Joseph Sullivan; Liu Lin Thio; Anu Venkat; Eileen P G Vining; Gretchen K Von Allmen; Judith L Weisenberg; Peter Widdess-Walsh; Melodie R Winawer Journal: Nature Date: 2013-08-11 Impact factor: 49.962
Authors: Ilaria Guella; Marna B McKenzie; Daniel M Evans; Sarah E Buerki; Eric B Toyota; Margot I Van Allen; Mohnish Suri; Frances Elmslie; Marleen E H Simon; Koen L I van Gassen; Delphine Héron; Boris Keren; Caroline Nava; Mary B Connolly; Michelle Demos; Matthew J Farrer Journal: Am J Hum Genet Date: 2017-08-03 Impact factor: 11.025
Authors: Maureen S Mulhern; Constance Stumpel; Nicholas Stong; Han G Brunner; Louise Bier; Natalie Lippa; James Riviello; Rob P W Rouhl; Marlies Kempers; Rolph Pfundt; Alexander P A Stegmann; Mary K Kukolich; Aida Telegrafi; Anna Lehman; Elena Lopez-Rangel; Nada Houcinat; Magalie Barth; Nicolette den Hollander; Mariette J V Hoffer; Sarah Weckhuysen; Jolien Roovers; Tania Djemie; Diana Barca; Berten Ceulemans; Dana Craiu; Johannes R Lemke; Christian Korff; Heather C Mefford; Candace T Meyers; Zsuzsanna Siegler; Susan M Hiatt; Gregory M Cooper; E Martina Bebin; Lot Snijders Blok; Hermine E Veenstra-Knol; Evan H Baugh; Eva H Brilstra; Catharina M L Volker-Touw; Ellen van Binsbergen; Anya Revah-Politi; Elaine Pereira; Danielle McBrian; Mathilde Pacault; Bertrand Isidor; Cedric Le Caignec; Brigitte Gilbert-Dussardier; Frederic Bilan; Erin L Heinzen; David B Goldstein; Servi J C Stevens; Tristan T Sands Journal: Ann Neurol Date: 2018-10-25 Impact factor: 10.422
Authors: Candace T Myers; Nicholas Stong; Emily I Mountier; Katherine L Helbig; Saskia Freytag; Joseph E Sullivan; Bruria Ben Zeev; Andreea Nissenkorn; Michal Tzadok; Gali Heimer; Deepali N Shinde; Arezoo Rezazadeh; Brigid M Regan; Karen L Oliver; Michelle E Ernst; Natalie C Lippa; Maureen S Mulhern; Zhong Ren; Annapurna Poduri; Danielle M Andrade; Lynne M Bird; Melanie Bahlo; Samuel F Berkovic; Daniel H Lowenstein; Ingrid E Scheffer; Lynette G Sadleir; David B Goldstein; Heather C Mefford; Erin L Heinzen Journal: Am J Hum Genet Date: 2017-09-21 Impact factor: 11.025
Authors: Kaylee Park; Laurie E Seltzer; Emily Tuttle; Ghayda M Mirzaa; Alex R Paciorkowski Journal: Am J Med Genet A Date: 2017-05-02 Impact factor: 2.802
Authors: Erin L Heinzen; Adam C O'Neill; Xiaolin Zhu; Andrew S Allen; Melanie Bahlo; Jamel Chelly; Ming Hui Chen; William B Dobyns; Saskia Freytag; Renzo Guerrini; Richard J Leventer; Annapurna Poduri; Stephen P Robertson; Christopher A Walsh; Mengqi Zhang Journal: PLoS Genet Date: 2018-05-08 Impact factor: 5.917
Authors: Antonietta Coppola; Elena Cellini; Hannah Stamberger; Elmo Saarentaus; Valentina Cetica; Dennis Lal; Tania Djémié; Magdalena Bartnik-Glaska; Berten Ceulemans; J Helen Cross; Tine Deconinck; Salvatore De Masi; Thomas Dorn; Renzo Guerrini; Dorotha Hoffman-Zacharska; Frank Kooy; Lieven Lagae; Nicholas Lench; Johannes R Lemke; Ersilia Lucenteforte; Francesca Madia; Heather C Mefford; Deborah Morrogh; Peter Nuernberg; Aarno Palotie; An-Sofie Schoonjans; Pasquale Striano; Elzbieta Szczepanik; Anna Tostevin; Joris R Vermeesch; Hilde Van Esch; Wim Van Paesschen; Jonathan J Waters; Sarah Weckhuysen; Federico Zara; Peter De Jonghe; Sanjay M Sisodiya; Carla Marini Journal: Epilepsia Date: 2019-03-13 Impact factor: 5.864