Steffi Grote1, Kay Prüfer1, Janet Kelso1, Michael Dannemann2. 1. Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig 04103, Germany. 2. Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig 04103, Germany Medical Faculty, University of Leipzig, Leipzig 04103, Germany.
Abstract
We present ABAEnrichment, an R package that tests for expression enrichment in specific brain regions at different developmental stages using expression information gathered from multiple regions of the adult and developing human brain, together with ontologically organized structural information about the brain, both provided by the Allen Brain Atlas. We validate ABAEnrichment by successfully recovering the origin of gene sets identified in specific brain cell-types and developmental stages. AVAILABILITY AND IMPLEMENTATION: ABAEnrichment was implemented as an R package and is available under GPL (≥ 2) from the Bioconductor website (http://bioconductor.org/packages/3.3/bioc/html/ABAEnrichment.html). CONTACTS: steffi_grote@eva.mpg.de, kelso@eva.mpg.de or michael_dannemann@eva.mpg.deSupplementary information: Supplementary data are available at Bioinformatics online.
We present ABAEnrichment, an R package that tests for expression enrichment in specific brain regions at different developmental stages using expression information gathered from multiple regions of the adult and developing human brain, together with ontologically organized structural information about the brain, both provided by the Allen Brain Atlas. We validate ABAEnrichment by successfully recovering the origin of gene sets identified in specific brain cell-types and developmental stages. AVAILABILITY AND IMPLEMENTATION: ABAEnrichment was implemented as an R package and is available under GPL (≥ 2) from the Bioconductor website (http://bioconductor.org/packages/3.3/bioc/html/ABAEnrichment.html). CONTACTS: steffi_grote@eva.mpg.de, kelso@eva.mpg.de or michael_dannemann@eva.mpg.deSupplementary information: Supplementary data are available at Bioinformatics online.
Functional enrichment analyses using ontologies such as the Gene Ontology have been widely used to gain insight into the functions and phenotypes influenced by sets of gene candidates that emerge from various genome-wide screens. Patterns of gene expression can be very informative about developmental processes and functions taking place in particular organs or tissues, but this requires extensive expression information, including component tissues and developmental stages, to be available for the organ/tissue of interest.The Allen Human Brain Atlas projects have quantified gene expression in multiple regions of the human brain over the course of brain development, and provide a valuable resource that can be used to pinpoint sets of genes that are characteristic of particular brain regions and/or developmental stages (Hawrylycz ; Miller ; Allen Human Brain Atlas; human.brain-map.org and BrainSpan Atlas of the Developing Human Brain; brainspan.org). To date, the transcriptomes of multiple regions of the brains of adult humans as well as brains from individuals at various developmental stages have been measured (Supplementary Tables S1–S4). None of the tools developed to date (e.g. Brain Explorer, http://human.brain-map.org/static/brainexplorer) allow for the statistical evaluation of gene candidates in terms of expression enrichment in specific regions or stages using an ontology. Here, we demonstrate the utility of a tool to carry out enrichment analyses using the Allen Brain Atlas information.
2 Methods
In order to perform ontology gene set enrichment analyses for expression in specific brain regions it is necessary to assign to each anatomical structure the genes that are expressed in that structure. We apply an expression threshold to identify expressed genes in this step. A workflow illustrating the entire data processing and enrichment analyses is shown in Supplementary Figure S1.
2.1 Gene expression data
We used two human brain expression datasets provided by the Allen Brain Atlas. First, gene expression in the adult human brain was measured in six adult individuals using a microarray (Microarray Survey, Oct. 2013 v.7, Supplementary Table S1), and consists of normalized expression profiles for ∼16 000 genes in 414 brain regions (Microarray Data Normalization, March 2013 v.1 see Table S2). Second, the developing human brain dataset consists of normalized RNA-Seq expression measurement (RPKM, ‘RNA-Seq Gencode v10 summarized to genes') of ∼17 000 genes from 42 human donors classified into 31 different developmental periods ranging from 8 post-conceptual weeks to 40 years (Table S3). 16 of the 26 available brain regions were sampled in donors of at least 20 different ages, while the remaining 10 brain regions were sampled in fewer than five ages (Supplementary Tables S3 and S4, brainspan.org).
2.2 Processing of expression data
For the adult human brain datasets the normalized expression estimates for genes in brain regions were computed by averaging the expression levels across all samples and probe sets for a given gene and brain region (see Section 2.1). To increase the power to detect developmental effects we restricted to the 16 brain regions sampled in at least 20 age categories. When testing for expression enrichment across five developmental stages (Supplementary Table S5) we computed the mean RPKM expression value of a gene in all samples for a given brain region and developmental stage. The input expression datasets were provided by the Allen Brain Atlas and the processed data are available via our data package ABAData (http://bioconductor.org/packages/3.2/data/experiment/html/ABAData.html).
2.3 Ontologies
The Allen Brain Atlas provides non-overlapping ontologies that describe the developing and adult human brain and contain 3317 and 1534 brain regions, respectively. Direct expression data for the adult brain is available for 414 of the 1534 brain regions in the corresponding ontology. Genes in brain regions without direct expression measurements are defined to be expressed if at least one sub-region shows expression evidence for the given gene (Supplementary Table S6). For the developing human brain we restricted the corresponding ontology to the 16 brain regions with expression data available for at least 20 age time points. When including the superstructures inheriting expression data from these 16 brain regions, a total of 47 brain regions were used in the enrichment analysis (Supplementary Table S6).
2.4 Enrichment analyses
Genes are annotated to brain regions using default or user-defined expression cut-offs. Default cut-offs are the 10%-steps of expression quantiles across all brain regions. The enrichment analysis is then performed by the core function aba_enrich using either a hypergeometric test or a Wilcoxon-rank sum test as implemented in FUNC (Prüfer , supplementary Figure 1). The full functionality of our package is described in our Bioconductor R vignette: https://bioconductor.org/packages/release/bioc/vignettes/ABAEnrichment/inst/doc/ABAEnrichment.html
3 Evaluation
To determine whether our approach accurately identifies genes that have previously been reported to show enrichment in different brain regions we tested the software on two datasets (Cahoy ; Kang , supplementary data file). Cahoy et al. report a set of marker genes for oligodendrocytes. We selected the top 40 markers as candidates and tested for their enrichment in the adult brain using ABAEnrichment. Oligodendrocytes have been reported to be predominantly present in the white matter of the brain (Hagan ). Expression enrichment analysis indicated that the 40 marker genes are significantly enriched in multiple regions (min FWER < 0.05) of which the top five enriched regions represent all white matter tissues available (Supplementary Table S7). Secondly, we used genes that have been shown to be differentially expressed at different developmental stages in multiple human brain regions (Supplementary Table S8, Kang ). From the six brain regions Kang et al. reported, five were measured directly for the Allen Brain Atlas and one (neocortex) has substructures with direct expression measurements (Supplementary Table S9). Strikingly, when we restricted our analysis to regions where we have direct expression evidence in all cases the region Kang et al. reported had the lowest mean FWER across default expression cut-offs (Supplementary Table S10). For genes reported to be specifically expressed in neocortex we find that the brain region with the lowest mean FWER is in all cases a direct sub-region of neocortex.
4 Conclusions
ABAEnrichment provides a method for analyzing gene set expression enrichment in the brain regions assayed by the Allan Brain Atlas projects, and is the first software package to perform genome-wide analyses using this resource. We show that it is possible to use the fine-grained expression profiles provided by the Allen Brain Atlas projects to determine whether sets of candidate genes identified through analyses such as screens for positive selection (Scheinfeldt and Tishkoff, 2013), genome-wide association studies (Visscher ) and analyses of archaic introgression (Vattathil and Akey, 2015) are enriched in particular brain regions in the adult or developing brain. These expression patterns might provide insights into the processes in which these genes act, and this information can then be used to guide further functional experiments in cell lines, organoids or model organisms.Click here for additional data file.
Authors: John D Cahoy; Ben Emery; Amit Kaushal; Lynette C Foo; Jennifer L Zamanian; Karen S Christopherson; Yi Xing; Jane L Lubischer; Paul A Krieg; Sergey A Krupenko; Wesley J Thompson; Ben A Barres Journal: J Neurosci Date: 2008-01-02 Impact factor: 6.167
Authors: Hyo Jung Kang; Yuka Imamura Kawasawa; Feng Cheng; Ying Zhu; Xuming Xu; Mingfeng Li; André M M Sousa; Mihovil Pletikos; Kyle A Meyer; Goran Sedmak; Tobias Guennel; Yurae Shin; Matthew B Johnson; Zeljka Krsnik; Simone Mayer; Sofia Fertuzinhos; Sheila Umlauf; Steven N Lisgo; Alexander Vortmeyer; Daniel R Weinberger; Shrikant Mane; Thomas M Hyde; Anita Huttner; Mark Reimers; Joel E Kleinman; Nenad Sestan Journal: Nature Date: 2011-10-26 Impact factor: 49.962
Authors: Michael J Hawrylycz; Ed S Lein; Angela L Guillozet-Bongaarts; Elaine H Shen; Lydia Ng; Jeremy A Miller; Louie N van de Lagemaat; Kimberly A Smith; Amanda Ebbert; Zackery L Riley; Chris Abajian; Christian F Beckmann; Amy Bernard; Darren Bertagnolli; Andrew F Boe; Preston M Cartagena; M Mallar Chakravarty; Mike Chapin; Jimmy Chong; Rachel A Dalley; Barry David Daly; Chinh Dang; Suvro Datta; Nick Dee; Tim A Dolbeare; Vance Faber; David Feng; David R Fowler; Jeff Goldy; Benjamin W Gregor; Zeb Haradon; David R Haynor; John G Hohmann; Steve Horvath; Robert E Howard; Andreas Jeromin; Jayson M Jochim; Marty Kinnunen; Christopher Lau; Evan T Lazarz; Changkyu Lee; Tracy A Lemon; Ling Li; Yang Li; John A Morris; Caroline C Overly; Patrick D Parker; Sheana E Parry; Melissa Reding; Joshua J Royall; Jay Schulkin; Pedro Adolfo Sequeira; Clifford R Slaughterbeck; Simon C Smith; Andy J Sodt; Susan M Sunkin; Beryl E Swanson; Marquis P Vawter; Derric Williams; Paul Wohnoutka; H Ronald Zielke; Daniel H Geschwind; Patrick R Hof; Stephen M Smith; Christof Koch; Seth G N Grant; Allan R Jones Journal: Nature Date: 2012-09-20 Impact factor: 49.962
Authors: Jeremy A Miller; Song-Lin Ding; Susan M Sunkin; Kimberly A Smith; Lydia Ng; Aaron Szafer; Amanda Ebbert; Zackery L Riley; Joshua J Royall; Kaylynn Aiona; James M Arnold; Crissa Bennet; Darren Bertagnolli; Krissy Brouner; Stephanie Butler; Shiella Caldejon; Anita Carey; Christine Cuhaciyan; Rachel A Dalley; Nick Dee; Tim A Dolbeare; Benjamin A C Facer; David Feng; Tim P Fliss; Garrett Gee; Jeff Goldy; Lindsey Gourley; Benjamin W Gregor; Guangyu Gu; Robert E Howard; Jayson M Jochim; Chihchau L Kuan; Christopher Lau; Chang-Kyu Lee; Felix Lee; Tracy A Lemon; Phil Lesnar; Bergen McMurray; Naveed Mastan; Nerick Mosqueda; Theresa Naluai-Cecchini; Nhan-Kiet Ngo; Julie Nyhus; Aaron Oldre; Eric Olson; Jody Parente; Patrick D Parker; Sheana E Parry; Allison Stevens; Mihovil Pletikos; Melissa Reding; Kate Roll; David Sandman; Melaine Sarreal; Sheila Shapouri; Nadiya V Shapovalova; Elaine H Shen; Nathan Sjoquist; Clifford R Slaughterbeck; Michael Smith; Andy J Sodt; Derric Williams; Lilla Zöllei; Bruce Fischl; Mark B Gerstein; Daniel H Geschwind; Ian A Glass; Michael J Hawrylycz; Robert F Hevner; Hao Huang; Allan R Jones; James A Knowles; Pat Levitt; John W Phillips; Nenad Sestan; Paul Wohnoutka; Chinh Dang; Amy Bernard; John G Hohmann; Ed S Lein Journal: Nature Date: 2014-04-02 Impact factor: 49.962
Authors: Sara B Linker; Jonathan Y Hsu; Adela Pfaff; Debha Amatya; Shu-Meng Ko; Sarah Voter; Quinn Wong; Fred H Gage Journal: Bioinformatics Date: 2019-01-15 Impact factor: 6.937
Authors: Attila Szabo; Ibrahim A Akkouh; Matthieu Vandenberghe; Jordi Requena Osete; Timothy Hughes; Vivi Heine; Olav B Smeland; Joel C Glover; Ole A Andreassen; Srdjan Djurovic Journal: Transl Psychiatry Date: 2021-10-29 Impact factor: 6.222
Authors: Pieter Jelle Visser; Lianne M Reus; Johan Gobom; Iris Jansen; Ellen Dicks; Sven J van der Lee; Magda Tsolaki; Frans R J Verhey; Julius Popp; Pablo Martinez-Lage; Rik Vandenberghe; Alberto Lleó; José Luís Molinuevo; Sebastiaan Engelborghs; Yvonne Freund-Levi; Lutz Froelich; Kristel Sleegers; Valerija Dobricic; Simon Lovestone; Johannes Streffer; Stephanie J B Vos; Isabelle Bos; August B Smit; Kaj Blennow; Philip Scheltens; Charlotte E Teunissen; Lars Bertram; Henrik Zetterberg; Betty M Tijms Journal: Mol Neurodegener Date: 2022-03-28 Impact factor: 14.195
Authors: Chinar Patil; Jonathan B Sylvester; Kawther Abdilleh; Michael W Norsworthy; Karen Pottin; Milan Malinsky; Ryan F Bloomquist; Zachary V Johnson; Patrick T McGrath; Jeffrey T Streelman Journal: Sci Rep Date: 2021-06-21 Impact factor: 4.379
Authors: Ole A Andreassen; Srdjan Djurovic; Jordi Requena Osete; Ibrahim A Akkouh; Denis Reis de Assis; Attila Szabo; Evgeniia Frei; Timothy Hughes; Olav B Smeland; Nils Eiel Steen Journal: Mol Psychiatry Date: 2021-06-01 Impact factor: 15.992