| Literature DB >> 31963710 |
Arkadiy K Golov1,2, Nikolay V Kondratyev1, George P Kostyuk3, And Vera E Golimbet1.
Abstract
Recent advances in psychiatric genetics have led to the discovery of dozens of genomic loci associated with schizophrenia. However, a gap exists between the detection of genetic associations and understanding the underlying molecular mechanisms. This review describes the basic approaches used in the so-called post-GWAS studies to generate biological interpretation of the existing population genetic data, including both molecular (creation and analysis of knockout animals, exploration of the transcriptional effects of common variants in human brain cells) and computational (fine-mapping of causal variability, gene set enrichment analysis, partitioned heritability analysis) methods. The results of the crucial studies, in which these approaches were used to uncover the molecular and neurobiological basis of the disease, are also reported.Entities:
Keywords: GWAS; brain epigenomics; causal genetic variants; enhancers; genome/epigenome editing; schizophrenia
Mesh:
Year: 2020 PMID: 31963710 PMCID: PMC7017322 DOI: 10.3390/cells9010246
Source DB: PubMed Journal: Cells ISSN: 2073-4409 Impact factor: 6.600
Figure 1Results of recent schizophrenia genome-wide association studies (GWASs) and basic methods for identification of schizophrenia genes. (A) Manhattan plot for recent schizophrenia GWAS meta-analysis [11]. Many GWAS regions cover several genes. However, some of them are located in intergenic DNA. Two representative GWAS regions are zoomed in: significantly associated polymorphisms are depicted as vertical bars. (B) Statistical fine-mapping of genetic associations. Each polymorphism’s probability to be causal is assessed in this type of analysis. Additional epigenetic information can improve prediction accuracy. Idealized fine-mapping of GWAS region is depicted. Only one of 10 genome-wide significant polymorphisms (highlighted in dashed rectangle) appears to be credible causal variant. (C) Trans-ethnic GWAS. Trans-ethnic study including human populations of three different ancestries is represented. Picture shows one of 10 variants from an idealized GWAS region as consistently (non-heterogeneously) associated with the phenotype. It is assumed that such variants are likely to be causal. (D) Study of highly penetrant mutations with brain-related phenotypes. Genes, identified using three different approaches of this class as being schizophrenia genes, are represented. Whole-exome sequencing (WES) studies indicated that rare mutations in SLC6A1 cause schizophrenia [16]. This strongly suggests that expression of SLC6A1 is regulated by schizophrenia-associated common variants as one of GWAS regions is located in close vicinity to this gene (200 kb upstream). Rare Mendelian syndrome with psychiatric symptoms confirms role of TCF4 in schizophrenia development. Various disruptive mutations in this gene lead to dominant autosomal Pitt-Hopkins syndrome, characterized in particular by epilepsy and mental retardation [17]. Finally, phenotypes of model animals with deliberately knocked-out genes identified ZNF536 as being one of the schizophrenia genes. ZNF536 double-knockout zebrafish line shows behavioural and neuroanatomical (decreased forebrain volume) changes [18]. (E) Description of transcriptional effects of common variation in human neuronal cells. Converging lines of evidence obtained using these methods indicate that FOXG1 is likely to be regulated by a schizophrenia causal variant. Study of spatial chromatin organization in human fetal brain revealed that one of the schizophrenia GWAS regions interacts with the promoter of FOXG1 located 750 kb from it [19]. Subsequent study showed that FOXG1-interacting SNP rs1191551 is close to one of the fetal brain ATAC-seq peaks [20]. Functional test (luciferase assay) demonstrated enhancer activity of genomic fragment, harbouring rs1191551. Furthermore, this activity was dependent on genotype. The role of this region in regulation of FOXG1 was additionally confirmed by CRISPR-Cas9 deletion of 500 bp surrounding rs1191551 in neural progenitor cells, with the latter leading to a significant decrease in expression of FOXG1 but not any other nearby gene.
Approaches most commonly used in schizophrenia post-GWAS studies.
| Molecular Biology Techniques | |||
|---|---|---|---|
| Method | Description | Application in Post-GWAS Studies | Selected Publications |
| RNA-seq | High-throughput sequencing of reverse-transcribed RNA allows quantitative assessment of transcriptional activity for each gene in a given sample. This technology made the most basic molecular phenotype easily measurable. | Direct case-control comparison of brain RNA-seq datasets is used for the search of genes with altered expression (see differential expression analysis). Being joined with genome-wide genotyping data for a human cohort, brain RNA-seq datasets can be used in e/isoQTL analysis. Alternatively, populational RNA-seq data is necessary for construction of brain-specific WGCNA networks, which are then used in gene set enrichment analysis or heritability enrichment analysis. | [ |
| ChIP-seq | This method is essentially chromatin immunoprecipitation (IP) coupled with high-throughput sequencing. It is employed for genome-wide search for DNA sites occupied by proteins of interest. Antibodies against chromatin-interacting proteins of interest are incubated with sheared chromatin, and DNA bound by antibodies is precipitated, purified, and subjected to sequencing. | The approach is used for genome-wide annotation of sequences potentially acting as enhancers. IP with antibodies against enhancer-specific histone modifications (e.g., H3K27ac and H3K4me1) are especially useful. As causal polymorphisms are expected to localize within enhancers, ChIP-seq-predicted neuronal enhancers harbouring schizophrenia-associated SNPs are primary targets for subsequent functional interrogation with luciferase test and genome/epigenome editing. Alternatively, given a high level of overall enhancer tissue specificity, enhancers annotated in the cell type can be used in heritability enrichment analysis to test the relevance of this cell type for disease development. | [ |
| Chromatin accessibility assays (DNase-seq and ATAC-seq) | The techniques are based on enrichment of sequencing libraries with histone-depleted genomic regions. This is achieved by means of enzymes specifically targeting such sites in chromatin (DNase I and Tn5 transposase) with subsequent preferential amplification of short DNA fragments excised by these enzymes. Accessible chromatin-enriched libraries are subjected to next-generation sequencing. | It is widely assumed that TSS-distal open chromatin regions are colocalized with active enhancers, thus these methods along with ChIP-seq are used for enhancer inference. As causal polymorphisms are expected to localize within enhancers, DNase-seq / ATAC-seq-predicted neuronal enhancers harbouring schizophrenia-associated SNPs are primary targets for subsequent functional interrogation with luciferase test and genome/epigenome editing. Alternatively, given a high level of overall enhancer tissue specificity, enhancers annotated in the cell type can be used in heritability enrichment analysis to test the relevance of this cell type for disease development. | [ |
| High-throughput proximity ligation assays (Hi-C and Promoter Capture Hi-C) | In high-throughput proximity ligation methods, distances between pairs of genomic sites are assessed by means of proximity ligation followed by next-generation sequencing. Hi-C allows measurement of proximity between any pair of genomic sites. Promoter Capture Hi-C offers the opportunity to assess distances between promoters and any other genomic site with reduced sequencing burden compared to Hi-C. | It is believed that promoters spatially interact with their cognate enhancers. Thus, proximity-ligation methods are utilized to infer functional enhancer-promoter links. If an enhancer, involved in the enhancer-gene loop in neuronal cells, at the same time harbour schizophrenia-associated SNPs, this physical proximity can be utilized as evidence for genes having a causal role in the disease. Functional links between this gene and the enhancer, containing a schizophrenia-associated genetic variant, can be accurately confirmed with genome/epigenome editing. | [ |
| Episome-based functional reporter assays (luciferase assay and STARR-seq) | A potential enhancer sequence is inserted in specially-designed episome, harbouring reporter genes. Transfection of the construct into cells with subsequent measurement of reporter gene expression levels allows assessment of enhancer activity for a tested sequence in a given cell type. The luciferase test was designed for low-throughput testing of enhancer sequences (one at a time), whereas the STARR-seq allows testing of thousands of genomic sites in one experiment. | The luciferase assay is used to confirm regulatory activity of schizophrenia-associated genomic sites predicted to be enhancers in brain cells. Often, such predictions are based on the results of the aforementioned epigenomic methods: ChIP-seq, chromatin accessibility assays or high-throughput proximity ligation assays. Besides that, the influence of alternative alleles of schizophrenia-associated SNPs on activity of enhancers, in which given polymorphic sites reside, can be measured with the luciferase assay. STARR-seq can potentially be used to probe all genomic sites on their enhancer activity in a given brain cell type. Localisation of schizophrenia-associated variants inside STARR-seq confirmed brain enhancers can be considered strong evidence of causality. | [ |
| Genome editing (CRISPR-Cas) | In situ targeted manipulation of genomic sequence exploiting bacterial Cas DNA nuclease (e.g., Cas9) guided by short RNA fragments (gRNA). The currently available CRISPR-Cas tools allow in some cases single nucleotide-precision genome editing and therefore creating isogenic models for functional testing of SNPs. Other CRISPR-Cas systems are used to excise short fragments (several hundreds of bp) of DNA from the genome. | CRISPR-Cas approaches can be used in human neural cells for substitution of individual schizophrenia-associated nucleotides or excision of entire enhancers, harbouring such nucleotides. These enhancers are usually predicted with ChIP-seq or/and chromatin accessibility assays. Editing is followed by assessment of changes in expression of enhancer cognate genes. Alternatively, genome editing is used for creation of knock-out model animals to test the role of potential schizophrenia genes in brain development and function. | [ |
| Epigenome editing (CRISPRi) | These tools were designed for targeted in situ epigenetic inactivation of regulatory sequences in the genome. This was made possible by abolishing nuclease activity of Cas9 and fusion of this protein with various eukaryotic transcription inhibitory domains (e.g., KRAB-domain, MECP2 inhibitory domain). | Epigenome editing is used as a simplified alternative of genome editing for functional confirmation of regulatory activity of enhancers containing schizophrenia-associated polymorphisms. Besides that, CRISPRi can be used in the search for genes regulated by such enhancers. | [ |
| Statistical fine-mapping of genetic associations (BIMBAM, CAVIAR, FINEMAP, etc.) | This approach is represented by a family of instruments that seeks to determine causal variants in each GWAS region. Basically, fine-mapping algorithms seek to predict which polymorphism in a disease-associated linkage disequilibrium (LD) block better explains association of the entire region with the phenotype. | In some cases, causal SNPs can be confidently identified within schizophrenia GWAS regions with statistical fine-mapping. If such variants are localized outside of the coding regions, their position relative to predicted and functionally confirmed brain enhancers can be assessed. Episome-based functional reporter assays and genome/epigenome editing can be subsequently applied to confirm enhancer activity and find genes controlled by this particular schizophrenia-associated enhancer. | [ |
| Trans-ethnic GWAS meta-analysis | In trans-ethnic GWASs, results of several GWAS experiments, obtained for genetically distant populations, are compared side-by-side. This approach is based on the notion that true causal variants must be associated with the disease in any studied cohort, independent of background LD structure. Thus, trans-ethnic GWASs take advantage of differences in LD structure among various human populations to fine-map causal polymorphisms. | All strategies described for statistical fine-mapping of genetic associations are applicable to trans-ethnic GWASs. | [ |
| Differential expression (DE) analysis | There are a number of computational tools for decent comparison of RNA-seq (or expression microarray) results between different tissues, different experimental conditions or individuals with different phenotypes. Collectively these tools can be referred to as DE analysis. The main output of DE analysis is a list of genes, of which expression significantly differs between compared datasets. | Genes differentially expressed in brains of cases and controls could be potentially involved in schizophrenia development. However, it is extremely hard to pinpoint truly causal genes among thousands of genes found to be differentially expressed in these two cohorts. In recent years, this strategy has been largely replaced by transcriptome-wide association studies and iso/eQTL analysis. | [ |
| e/isoQTL analysis | Joined analysis of RNA-seq data and matched genome-wide genotyping results obtained from the cohort of individuals allows discovery of relationships between SNPs and levels of gene expression in the studied tissue. SNPs that significantly influence levels of expression or splicing pattern of any gene are called eQTLs and isoQTLs. | Originally, it was assumed that SNPs associated with schizophrenia, and at the same time being brain e/isoQTLs, are highly likely causal variants. Furthermore, genes regulated by such SNPs in the brain are credible schizophrenia genes. However, accumulation of data regarding e/isoQTL in the human brain (now thousands of such SNPs are detected) has led to the notion that e/isoQTL can co-localize with disease-associated variants by chance. Therefore, more rigorous approaches are now utilised to reliably confirm colocalization of GWAS and e/isoQTL signal (see “Colocalization tests”). | [ |
| Transcriptome-wide association study (TWAS) | Joined analysis of GWAS summary statistics and e/isoQTL analysis summary statistics makes possible inference of genetically-determined differences in expression levels of all genes between cases and controls of GWAS study in a given tissue (which is the tissue used in e/isoQTL analysis). The output of TWAS is a list of genes, of which expression significantly differs between cases and controls. | TWASs, based on schizophrenia GWASs and e/isoQTL analysis of human neuronal tissues, predict genes regulated by schizophrenia-associated polymorphisms. Essentially, there is definition of schizophrenia causal genes. However, owing to the phenomenon of LD, some TWAS-detected genes can be controlled by polymorphisms linked to causal ones. To account for these artefacts, additional tests, confirming colocalization of GWAS and e/isoQTL signals, are usually conducted (see “Colocalization tests”). | [ |
| Colocalization tests (SMR/HEIDI, Sherlock, coloc, etc.) | Colocalization tests are statistical tools used to verify whether association of a given polymorphism with two different phenotypes (e.g., disease and level of RNA of a specific gene in eQTL analysis) are based on the LD between two different causal SNPs or actual pleiotropy of one genetic variant. | Colocalization tests are often employed to confirm colocalization of a schizophrenia GWAS signal and signal from neuronal e/isoQTL analysis. This same approach is used both in simple e/isoQTL analysis and in TWASs. Given the rapid growth of both GWAS and e/isoQTL datasets, the peril of random colocalization of signals increase, which can subsequently lead to false-positive schizophrenia genes. Therefore, relevance of colocalization tests in these approaches has been realized in recent years. | [ |
| Weighted gene co-expression network analysis (WGCNA) | WGCNA is a data-driven method used for extraction of information, regarding gene sets, from expression data. In WGCNA, a number of RNA-seq (or expression microarray) datasets from the same tissue of different individuals is analysed. Alternatively, in some cases, information from various tissues can be used. Correlations in expression of all possible gene pairs are calculated, then correlation-based clustering of genes is performed. Clusters (modules) of tightly correlated (co-expressed) genes are assumed to represent biologically meaningful gene sets. | Modules detected with WGCNA analysis in human brains are useful gene sets, which are widely used in gene set enrichment analysis and partitioned heritability analysis. These methods allow detection of WGCNA modules relevant to schizophrenia development. | [ |
| Gene set enrichment analysis (MAGMA, FORGE, ALLIGATOR, MAGENTA, INRICH) | Gene set enrichment analysis (GSEA) is a toolbox of algorithms (e.g., MAGMA, FORGE, ALLIGATOR, MAGENTA, INRICH) used for inference of causal disease gene sets from GWAS summary statistics. Basically, gene-level p-values of disease association are calculated with these algorithms. Then, a list of studied gene sets and genes falling in each of these gene sets are submitted to the algorithm. Association of each gene set is assessed, based on gene-level p-values. Gene sets which survive multiple comparison adjustments are considered to be disease-relevant. Gene sets used in GSEA can be derived from various sources: curated databases (see | Various GSEA algorithms are used in schizophrenia post-GWAS studies to detect disease-relevant molecular networks and cell types. Among the most commonly used gene sets in this kind of analysis are: brain-derived WGCNA modules, genes specifically expressed in various cell populations, gene sets associated with neurological and behavioural changes in mice (from MGD database, see | [ |
| Partitioned heritability analysis | Partitioned heritability analysis is an alternative means to GSEA to detect phenotype-relevant gene sets or any other subset of genomic regions (ChIP-seq or ATAC-seq peaks, introns, exons, etc.). Heritability explained by certain types of genomic regions is compared in this algorithm with heritability explained by randomly sampled genomic regions. Regions significantly enriched in GWAS-derived disease heritability are assumed to be disease-relevant. All remarks about gene sets used in GSEA are applicable to partitioned heritability analysis. | All strategies described for gene set enrichment analysis are applicable to partitioned heritability analysis. Additionally, enhancer markers (derived from ChIP-seq or/and chromatin accessibility assays) for various tissues can be used to infer schizophrenia-relevant cell types. | [ |
Databases and other valuable datasets widely used in schizophrenia post-GWAS studies.
| Resource | Type of Information | Description | Link |
|---|---|---|---|
| Psychiatric genomics consortium (PGC) | GWAS results | Data on PGC-conducted GWASs for schizophrenia and various other common psychiatric diseases. Summary statistics are publicly available. | [ |
| MRC centre for neuropsychiatric genetics and genomics | GWAS results | Publicly available summary statistics of the largest published meta-analysis of schizophrenia GWASs. | [ |
| ENCODE (Encyclopedia of DNA elements) | Epigenomic and transcriptomic datasets, regulatory annotations | Raw and processed data on gene expression and chromatin structure in various human and mouse cell types. Integrative annotation of regulatory elements in dozens of cell types is also available. All datasets are publicly accessible. | [ |
| Roadmap Epigenomics project | Epigenomic and transcriptomic datasets | Raw and processed data on gene expression and chromatin structure in human stem cells and primary ex vivo tissues. All datasets are publicly available. | [ |
| FANTOM5 (Functional annotation of the mammalian genome) | Transcriptomic datasets, regulatory annotations | Сomprehensive data on RNA expression in different mammalian cell types. Annotations of promoters, enhancers and promoter-enhancer links are compiled. All datasets are publicly available. | [ |
| GTEx (the genotype-tissue expression project) | Transcriptomic datasets | Genome-wide expression profiles for 54 non-diseased tissues of a human body. | [ |
| CommonMind consortium knowledge portal | Genotype data, epigenomic and transcriptomic datasets | Expression data with matched genotype and ATAC-seq data from hundreds of postmortem brain samples from donors with schizophrenia, bipolar disease, and individuals with no neuropsychiatric disorders. Access to raw data is controlled. Results of differential expression and eQTL analysis are publicly available. | [ |
| PsychENCODE consortium knowledge portal | Genotype data, epigenomic and transcriptomic datasets, system-level integrative models | Epigenomic and transcriptomic datasets from hundreds of brain samples from donors with psychiatric conditions and individuals with no neuropsychiatric diagnosis on different ontogenetic stages. Raw data is access-controlled. Outputs of various types of follow-up analysis (eQTL, TWAS, WGCNA, cell type-specific regulatory networks, etc.) are publicly available. | [ |
| KEGG (Kyoto encyclopedia of genes and genomes) pathways database | Collection of annotated gene sets | Publicly available curated functional gene sets. | [ |
| GO (gene ontology) database | Collection of annotated gene sets | Publicly available lists of genes annotated by GO consortium as sharing “molecular function”, residing in the same “cellular component” or participating in the same “biological process”. | [ |
| MGD (mouse genome informatics database) | Collection of annotated gene sets | Gene sets compiled by MGD, based on the comprehensive catalogue of mouse mutations and phenotypes caused by these mutations. | [ |