| Literature DB >> 32817564 |
Wenran Li1,2, Zhana Duren1, Rui Jiang3, Wing Hung Wong4.
Abstract
A person's genome typically contains millions of variants which represent the differences between this personal genome and the reference human genome. The interpretation of these variants, i.e., the assessment of their potential impact on a person's phenotype, is currently of great interest in human genetics and medicine. We have developed a prioritization tool called OpenCausal which takes as inputs 1) a personal genome and 2) a reference context-specific TF expression profile and returns a list of noncoding variants prioritized according to their impact on chromatin accessibility for any given genomic region of interest. We applied OpenCausal to 6,430 samples across 18 tissues derived from the GTEx project and found that the variants prioritized by OpenCausal are highly enriched for eQTLs and caQTLs. We further propose a strategy to integrate the predicted open scores with genome-wide association studies (GWAS) data to prioritize putative causal variants and regulatory elements for a given risk locus (i.e., fine-mapping analysis). As an initial example, we applied this method to a GWAS dataset of human height and found that the prioritized putative variants and elements are correlated with the phenotype (i.e., heights of individuals) better than others.Entities:
Keywords: GWAS; fine-mapping analysis; noncoding variants; personal genome; regression model
Mesh:
Substances:
Year: 2020 PMID: 32817564 PMCID: PMC7474608 DOI: 10.1073/pnas.1922703117
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 11.205
Fig. 1.Model design. (A) Schematic overview of the OpenCausal approach. OpenCausal captures the change of chromatin accessibility caused by a variant, where the variation is derived from WGS data. (B) Schematic overview of the Ropen model. Ropen is a sequence-based regression model that predicts chromatin accessibility score for a RE using the expression of TFs binding on this region.
Fig. 2.Performance of gene expression prediction for GTEx samples. (A) Comparison between prediction using WGS-based chromatin accessibility scores with that using REF-based chromatin accessibility scores in terms of cross-sample correlation. (B) Performance of expression prediction on genes involved in eQTL interactions. (C) Performance of expression prediction on genes involved in eQTL interactions whose variants are located in REs.
Fig. 3.OpenCausal detects causal variants for REs. (A) Percentage of detected causal variants in different tissues. Reference SNP (rs) represents variants overlapped with the reported reference SNPs. Novel SNP (ns) represents variants that have not been reported as reference SNPs. (B) Schematic overview of Fisher’s exact test. (C) Validation of detected causal variants using tissue-specific eQTL data. (D) Validation of detected causal variants using caQTL data. Odds ratio represents the odds ratios calculated from Fisher’s exact test. Percentage means the percentage of eQTL/caQTL variants covered by the detected causal variants. (E and F) Schematic overviews of the direct regulatory mechanism (E) and the indirect regulatory mechanism (F) for the interpretation of eQTL interactions. (G) Percentage of interpreted tissue-specific eQTL interactions.
Fig. 5.Validation of prioritized putative causal variants for height GWAS. (A) Comparison between K top-ranked putative causal variants and K bottom-ranked variants (K = 20, 30, and 40) in the IGF1-related risk locus. y axis is the absolute log value of fold change between the average height of donors with the minor allele and that of donors with the major allele (i.e., |log FC|). *P < 0.05. (B) Comparison between top-ranked variants and bottom-ranked variants for 2,953 risk loci. Each dot represents a risk locus. The y-axis value of each dot is the average |log FC| of 40 top-ranked variants, and x axis is that of 40 bottom-ranked variants in this locus.
Fig. 4.Workflow of the prioritization of genetic variants for GWAS trait.