| Literature DB >> 31603214 |
Regina H Reynolds1, John Hardy1,2, Mina Ryten1, Sarah A Gagliano Taliun3.
Abstract
The past decade has seen a surge in the number of disease/trait-associated variants, largely because of the union of studies to share genetic data and the availability of electronic health records from large cohorts for research use. Variant discovery for neurological and neuropsychiatric genome-wide association studies, including schizophrenia, Parkinson's disease and Alzheimer's disease, has greatly benefitted; however, the translation of these genetic association results to interpretable biological mechanisms and models is lagging. Interpreting disease-associated variants requires knowledge of gene regulatory mechanisms and computational tools that permit integration of this knowledge with genome-wide association study results. Here, we summarize key conceptual advances in the generation of brain-relevant functional genomic annotations and amongst tools that allow integration of these annotations with association summary statistics, which together provide a new and exciting opportunity to identify disease-relevant genes, pathways and cell types in silico. We discuss the opportunities and challenges associated with these developments and conclude with our perspective on future advances in annotation generation, tool development and the union of the two.Entities:
Keywords: neurodegenerative disorders; cellular resolution; functional genomic annotations; genome-wide association; neuropsychiatric disorders
Mesh:
Year: 2019 PMID: 31603214 PMCID: PMC6885670 DOI: 10.1093/brain/awz295
Source DB: PubMed Journal: Brain ISSN: 0006-8950 Impact factor: 13.501
Figure 1Identifying annotations and genes of interest using GWAS summary statistics, relevant functional genomic annotations and genetic tools. To follow up on GWAS risk loci experimentally, the wet lab researcher requires disease-relevant model systems, biological pathways and genes, and ideally some indication of how disease affects gene expression or regulation (i.e. directionality of effect). Identifying annotations of interest and genes of interest are complementary approaches that combine to constrain the model systems, gene targets and gene-specific pathways to pursue in functional experiments. For a description of the various functional genomic annotation types and an overview of the tools see Box 1 and Table 2, respectively.
Tools to incorporate GWAS summary statistics with functional genomic annotations
| Tool | Reference | URL | Input | Limitation(s) | |
|---|---|---|---|---|---|
|
| |||||
|
| |||||
| fgwas |
|
| Summary statistics, Annotation file | One ‘causal’ variant per locus | |
| GARFIELD |
|
| Summary statistics, Annotation file, LD file, distance of each variant to the nearest transcription start site. | LD reference: provided LD information for UK10K, need recalculation for non-European studies | |
| GoShifter |
|
| Variant map file, Annotation file, LD file | Critical to match on the number of variants in LD at each locus | |
| GREGOR |
|
| LD-pruned index variants, Annotation file, LD file | One ‘causal’ variant per locus | |
| PAINTOR/fastPAINTOR |
|
| Locus file; LD matrix file, Annotation matrix file | Assumes functional variants are shared at pleiotropic risk regions; If using multiple correlated GWAS traits, there is an assumption that no samples will overlap (often not the case) | |
|
| |||||
| SLDP |
|
| Summary statistics, Annotation file, LD Reference | Limited by the quality of annotations; LD reference to match ancestry of GWAS | |
| s-LDSC |
|
| Summary statistics, Annotation file, LD Reference | Restricted to common variants; Limited by the quality of annotations; LD reference to match ancestry of GWAS | |
| SumHer |
|
| Summary statistics, Annotation file, LD Reference | Limited by the quality of annotations; LD reference to match ancestry of GWAS | |
|
| |||||
|
| |||||
| coloc |
|
| Summary statistics for GWAS and for eQTL | Restricted by quality of underlying eQTL data; One ‘causal’ variant per locus | |
| eCaviar |
|
| Summary statistics for GWAS and for eQTL, GWAS LD file, eQTL LD file | Restricted by quality of underlying eQTL data | |
| enloc |
|
| Summary statistics for GWAS and for eQTL | Restricted by quality of underlying eQTL data | |
| moloc |
|
| Summary statistics for GWAS and QTL data. | Restricted by quality of underlying QTL data; One ‘causal’ variant per locus | |
|
| |||||
| MRBase |
|
| Choose input from existing database (or use R package) | Restricted by quality of underlying eQTL data; Mendelian randomization assumptions must be met | |
| TWAS/ FUSION |
|
| Choose input from existing database | Restricted by quality of underlying eQTL data | |
Tools listed in alphabetical order.
Signed linkage disequilibrium profile regression.
Stratified LD score regression.
The 2D space of cellular and molecular resolution. Individual functional genomic annotations can be thought to lie somewhere on the axes of cellular resolution (spanning from whole tissue to single cells) and molecular resolution (spanning across epigenetic phenotypes, transcriptomic phenotypes and intermediates between the two). Points on the plot are purely illustrative, roughly depicting the relative number of functional annotations in each discrete population, with the most annotations currently found in the category of tissue-level steady-state mRNA levels and the least in the category of single cell steady-state isoform levels. To illustrate this categorization, examples of functional annotations highlighted in this review have been labelled. We expect that with future developments, the axis of molecular resolution will become increasingly populated with intermediate phenotypes such as steady-state isoform levels and various other RNA processing steps. Thus, in the future, populations within a category of cellular resolution will become less discrete across molecular phenotypes. For a description of above-mentioned molecular phenotypes and how they are assayed, see Box 1, and for further details on labelled functional annotations, see Table 1. Brain and single cell icons made by Eucalyp, Freepik and Smashicons from www.flaticon.com.
Highlighted brain-relevant functional genomic annotations
| Cellular resolution | Molecular phenotype | Web resource | Reference | Species |
|---|---|---|---|---|
|
| Chromatin accessibility | BOCA: |
| Human |
| Chromatin accessibility, gene expression | CommonMind: |
| Human | |
| Gene expression, eQTLs | BrainSeq Consortium: |
| Human | |
| GTEx: |
| Human | ||
| UKBEC: |
| Human | ||
| Gene expression, | Allen Brain Atlas: |
| Human | |
| Chromatin accessibility, epigenetic modifiers, gene expression, QTLs | AMP-AD Knowledge Portal: |
| Mouse, | |
| Brain xQTL Serve: |
| Human | ||
| FANTOM5: | Mouse, human | |||
| PsychENCODE: |
| Human | ||
| PsychENCODE Knowledge Portal: | ||||
| PEC Capstone Collection: | ||||
| ROSMAP: |
| Human | ||
|
| ||||
| Homogeneous cell populations | DNA methylation | No web resource. Data available from Gene Expression Omnibus under GSE96615. Processed data also available through UCSC hub: |
| Human |
| Gene expression |
|
| Mouse, human | |
| BRAINcode Project: |
| Human | ||
| Single-cell analyses | Chromatin accessibility |
|
| Mouse |
| Gene expression | The Broad Institute: | Mouse, Human | ||
|
|
| Mouse, human | ||
|
|
| Mouse | ||
|
|
| Mouse | ||
| Tabula Muris: |
| Mouse, human | ||
| Chromatin accessibility, gene expression | The BRAIN Initiative: |
| Human | |
| Human Cell Atlas: |
| Human | ||
| PsychENCODE: |
| Human | ||
|
|
| Human | ||
| No web resource. Data available from Gene Expression Omnibus under GSE97942. |
| Human |
Selection of disease-related examples of successful application of tools for integration of functional genomic annotations with GWAS summary statistics
| Tool | Disease/Trait | Application and results | Reference |
|---|---|---|---|
|
| |||
| fastPAINTOR | Prostate cancer | Using PAINTOR together with 20 functional categories previously implicated in prostate cancer, a significant enrichment of prostate cancer-associated variants was found in FOX1A-binding sites assayed in the LNCaP cell line (derived from androgen-sensitive human prostate adenocarcinoma cells) and at binding sites for the androgen receptor. |
|
| High density lipoprotein / low density lipoprotein / total triglycerides | Using fastPAINTOR targeted at putative pleiotropic regions within the three traits, liver H3K4me1 and H3K27ac were found to have a strong enrichment of ‘causal’ variants shared across all three traits. |
| |
| GARFIELD | Schizophrenia | GARFIELD was applied to 29 diseases/complex traits and to several annotations, including DHSs and histone modifications. Statistically significant enrichments were found for most traits considered, and highlighted clear differences in enrichment patterns amongst traits. Of note to the focus of this review, schizophrenia-associated variants were found enriched in DHSs from blood and foetal brain, and H3K27ac and H3K4me3 predominantly in tissues of the CNS and blood/immune tissue. |
|
| GoShifter | Rheumatoid arthritis | Enrichment of rheumatoid arthritis-associated loci was found in summit regions of H3K4me3 peaks (active enhancer) in CD4+ T-memory cells, even when jointly analysed with an aggregate of 118 different cell types and tissues (which included >10 other immune cells), while no enrichment was found in the aggregate 118 cell types when conditioned upon CD4+ T-memory cells. |
|
| GREGOR | Atrial fibrillation | Atrial fibrillation-associated variants were found to be strongly associated with varying active enhancers in adult and foetal heart tissue e.g. H3K27ac in adult right atrium and left ventricle, and H3K4me1 in foetal heart tissue, highlighting the importance of these loci in transcriptional regulation of the adult heart and development of the foetal heart. |
|
| Type 2 diabetes | Among 184 trait- and disease-associated SNP sets (downloaded from the NHGRI-EBI GWAS Catalogue), the only disease found to be significantly enriched in chromatin accessibility QTLs from human pancreatic islet cells was type 2 diabetes. |
| |
| SLDP | Years of education / Crohn’s disease | SLDP was applied to 46 diseases and complex traits together with 382 transcription factor binding annotations spanning 75 transcription factors and 84 cell lines predicted using ENCODE ChIP-seq data. Analyses yielded 77 significant annotation-trait associations, spanning six diseases and complex traits. Of note to the focus of this review, a positive association was found between years of education and genome-wide binding of BCL11A, which has also been identified in rare variant studies of intellectual disability. Less relevant to the review, but of equal importance, other associations detected included a positive association between IRF1 and Crohn’s disease. |
|
| s-LDSC | Alzheimer’s disease/ Parkinson’s disease | Alzheimer’s disease and Parkinson’s disease heritability were found to be enriched in regulatory annotations marking gene activity in immune cells. |
|
| Alzheimer’s disease / Bipolar disorder / schizophrenia | s-LDSC was applied across 48 disease and traits, using gene expression and chromatin data from a number of sources, including: ENCODE, GTEx, PsychENCODE, and the ImmGen project. Of note to the diseases mentioned throughout this review, a significant enrichment of heritability was found for Alzheimer’s disease, bipolar disorder and schizophrenia in myeloid cells, GABAergic (inhibitory) neurons and glutamatergic (excitatory neurons, respectively. |
| |
| Schizophrenia | Pyramidal cells, medium spiny neurons and certain interneurons were found to be implicated in schizophrenia, using both s-LDSC and MAGMA (another form of enrichment method). |
| |
| Parkinson’s disease | Parkinson’s disease heritability was not found to enrich in investigated global and regional brain annotations or brain-related cell-type-specific annotations, but was found enriched in a curated lysosomal gene set. |
| |
|
| |||
| coloc/fgwas | Subclinical atherosclerosis | cIMT and carotid plaque were used as measures of subclinical atherosclerosis. Using coloc, three genes were identified whose gene expression in both early and late advanced atherosclerotic arterial wall was associated with risk of atherosclerosis development. Notably, two of the genes identified associated with GWAS loci where GWAS association |
|
| coloc/TWAS | Parkinson’s disease | A combination of coloc and TWAS was used to identify 66 genes, whose expression or splicing in DLPFC and peripheral monocytes was significantly associated with Parkinson’s disease risk. |
|
| MR | Parkinson’s disease | Mendelian randomization was used to associate 14 genes associated with mitochondrial function also associate with Parkinson’s disease risk. |
|
| moloc | Schizophrenia | moloc was used to identify 52 candidate genes associated with schizophrenia using eQTL and mQTL data derived from dorsolateral prefrontal cortex. |
|
ChIP = chromatin immunoprecipitation; cIMT = carotid artery intima thickness; DLPFC = dorsolateral prefrontal cortex.