| Literature DB >> 31653226 |
Daniel M Jordan1,2,3, Marie Verbanck1,2,3,4, Ron Do5,6,7.
Abstract
Horizontal pleiotropy, where one variant has independent effects on multiple traits, is important for our understanding of the genetic architecture of human phenotypes. We develop a method to quantify horizontal pleiotropy using genome-wide association summary statistics and apply it to 372 heritable phenotypes measured in 361,194 UK Biobank individuals. Horizontal pleiotropy is pervasive throughout the human genome, prominent among highly polygenic phenotypes, and enriched in active regulatory regions. Our results highlight the central role horizontal pleiotropy plays in the genetic architecture of human phenotypes. The HOrizontal Pleiotropy Score (HOPS) method is available on Github at https://github.com/rondolab/HOPS .Entities:
Keywords: GWAS; Genetic architecture; Pleiotropy; Polygenicity; R package; Statistical method
Mesh:
Year: 2019 PMID: 31653226 PMCID: PMC6815001 DOI: 10.1186/s13059-019-1844-7
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 17.906
Fig. 1Schematic of different types of pleiotropy. Previous studies distinguish between vertical pleiotropy, where effects on one trait are mediated through effects on another trait, and horizontal pleiotropy, where effects on multiple traits are independent
Fig. 2Contributions of linkage disequilibrium (LD) and polygenicity to horizontal pleiotropy. In addition to the normal sense of horizontal pleiotropy, both linkage disequilibrium (LD) and polygenicity are expected to contribute to horizontal pleiotropy. In the case of LD-induced horizontal pleiotropy, two linked SNVs have independent effects on different traits which appear pleiotropic because of the linkage between the SNVs. In the case of polygenicity-induced horizontal pleiotropy, two highly polygenic traits have an overlap in their polygenic footprint
Fig. 3Two-component pleiotropy score method. We (i) collect association statistics from the UK Biobank, (ii) process them using Mahalanobis whitening, (iii) compute the two components of our pleiotropy score (P and P) based on the whitened association statistics, (iv) use LD scores to correct for LD-induced pleiotropy ( and ), and (v) use permutation-based P values to correct for polygenic architecture ( and )
Fig. 4Simulation study showing false positive rate (a,b,c,d) and power (e,f,g,h) of two-component pleiotropy score. Top row shows performance on non-pleiotropic simulated variants (black line shows 5% false positive rate); bottom row shows performance on pleiotropic variants (black line shows 80% power). Simulations were run for both (left) and (right), and both without correction for polygenicity (a,c,e,g) and with the correction (b,f,d,h), with per-variant heritability ranging from 0.0002 to 0.2, proportion of non-pleiotropic causal loci ranging from 0 to 1%, and proportion of pleiotropic causal loci ranging from 0.1 to 1%. Our method has good power to detect pleiotropy for highly heritable traits, though its power is reduced by extreme polygenicity. Extreme polygenicity also increases the false positive rate, though this effect is corrected by our polygenicity correction
Fig. 5Quantile-quantile (Q-Q) plots showing the inflation of the pleiotropy score as a function of polygenicity. Variants are stratified into 4 batches of about 80 traits each by heritability, and then subdivided into 5 batches of about 20 traits each by polygenicity, as measured by corrected genomic inflation factor . Darker shades represent low polygenicity and lighter shades represent high polygenicity. All panels show −log10 transformed P values. The black lines show the expected value under the null hypothesis
Fig. 6Distribution of the pleiotropy score among variants (a), genes (b), and traits (c). a The global distribution of (left) and (right) for the 767,057 tested variants. The expected distribution under the null hypothesis of no pleiotropy is shown in red and the observed distribution is shown in blue. The vertical line represents the value of the pleiotropy score corresponding to genome-wide significance (P < 5 × 10− 8). A total of 1769 () and 643 () variants are not represented for the sake of clarity, because they have extreme values for the pleiotropy score. b The distribution of the average pleiotropy score for coding variants in each gene for (left) and (right). The top ten genes are represented on the right side of the plots, whereas genes with a pleiotropy score of 0 are represented on the left side of the plots. c The contribution of pleiotropic variants to 82 complex traits and diseases. Contribution of pleiotropic variants is calculated as the correlation coefficient between the absolute value of Z-scores and the pleiotropy score among variants that are genome-wide significant for the pleiotropy score (P < 5 × 10− 8 for and respectively)
Functional enrichment analysis of pleiotropy score
|
|
| |||
|---|---|---|---|---|
| Variant effect predictor | UTR | + 0.24 (± 0.01); | + 0.69 (± 0.02); | |
| Coding synonymous | + 0.24 (± 0.01); | + 0.61 (± 0.03); | ||
| Non-synonymous | + 0.19 (± 0.01); | + 0.48 (± 0.03); | ||
| Roadmap Epigenomics | H327ac | + 0.20 (± 0.01); | + 0.54 (± 0.01); | |
| H3K27me3 | + 0.02 (± 0.01); | + 0.01 (± 0.01); | ||
| Active TSS | + 0.20 (± 0.02); | + 0.54 (± 0.04); | ||
| Promoter | Promoter Upstream TSS | + 0.16 (± 0.01); | + 0.43 (± 0.02); | |
| Promoter Downstream TSS 1 | + 0.35 (± 0.01); | + 0.92 (± 0.03); | ||
| Promoter Downstream TSS 2 | + 0.30 (± 0.01); | + 0.86 (± 0.03); | ||
| Transcription | Transcribed - 5′ preferential | + 0.29 (± 0.01); | + 0.88 (± 0.01); | |
| Strong transcription | + 0.38 (± 0.01); | + 1.10 (± 0.01); | ||
| Transcribed - 3′ preferential | + 0.29 (± 0.01); | + 0.82 (± 0.01); | ||
| Weak transcription | + 0.21 (± 0.01); | + 0.60 (± 0.01); | ||
| Transcription and regulation | Transcribed and regulatory (Prom/Enh) | + 0.36 (± 0.01); | + 1.00 (± 0.02); | |
| Transcribed 5′ preferential and Enh | + 0.35 (± 0.01); | + 1.00 (± 0.01); | ||
| Transcribed 3′ preferential and Enh | + 0.33 (± 0.01); | + 0.92 (± 0.02); | ||
| Transcribed and Weak Enhancer | + 0.32 (± 0.01); | + 0.97 (± 0.01); | ||
| Active enhancer | Active Enhancer 1 | + 0.13 (± 0.01); | + 0.32 (± 0.01); | |
| Active Enhancer 2 | + 0.11 (± 0.01); | + 0.28 (± 0.01); | ||
| Active Enhancer Flank | + 0.11 (± 0.01); | + 0.29 (± 0.01); | ||
| Weak enhancer | Weak Enhancer 1 | + 0.07 (± 0.01); | + 0.16 (± 0.01); | |
| Weak Enhancer 2 | + 0.08 (± 0.01); | + 0.23 (± 0.01); | ||
| Primary H3K27ac possible Enhancer | + 0.09 (± 0.01); | + 0.24 (± 0.01); | ||
| Primary DNase | + 0.03 (± 0.01); | + 0.05 (± 0.01); | ||
| ZNF genes & repeats | + 0.08 (± 0.01); | + 0.20 (± 0.04); | ||
| Heterochromatin | − 0. 20 (± 0.01); | − 0.61 (± 0.01); | ||
| Poised Promoter | + 0.05 (± 0.01); | + 0.09 (± 0.01); | ||
| Bivalent Promoter | + 0.17 (± 0.01); | + 0.51 (± 0.03); | ||
| Repressed Polycomb | + 0.04 (± 0.01); | + 0.06 (± 0.01); | ||
| Quiescent/Low | −0.41 (± 0.01); | −1.20 (± 0.01); | ||
| GTEx - number of genes the variant is an eQTL for | eGenes< 10 | + 0.11 (± 0.01); | + 0.28 (± 0.01); | |
| eGenes> 10 & < 15 | + 0.19 (± 0.01); | + 0.52 (± 0.02); | ||
| eGenes> 15 & < 20 | + 0.31 (± 0.02); | + 0.88 (± 0.06); | ||
| eGenes> 20 | + 0.66 (± 0.06); | + 2.07 (± 0.18); | ||
| GTEx - number of tissues the variant is an eQTL for | eTissue< 30 | + 0.10 (± 0.01); | + 0.26 (± 0.01); | |
| eTissue> 30 & < 35 | + 0.21 (± 0.01); | + 0.54 (± 0.02); | ||
| eTissue> 35 & < 40 | + 0.36 (± 0.02); | + 1.13 (± 0.06); | ||
| eTissue> 40 | + 0.35 (± 0.05); | + 0.97 (± 0.14); | ||
| International Mouse Phenotyping Consortium | Phenotypes > 1 | + 0.06 (± 0.01); | + 0.19 (± 0.04); | |
| Phenotypes > 1 | + 0.09 (± 0.01); | + 0.26 (± 0.03); |
We grouped variants by (i) molecular function as annotated by Ensembl, (ii) predicted chromatin state as annotated by the NIH Roadmap Epigenomics Project, (iii) transcriptional effects as annotated by the NIH Genotype-Tissue Expression (GTex) Project, and (iv) effects on model organism phenotypes as annotated by the International Mouse Phenotyping Consortium (IMPC) and Saccharomyces Cerevisiae Morphological Database (SCMD). For each grouping, we computed the mean LD-corrected pleiotropy score and used two-sample Student’s t test to determine whether the mean was significantly different from the baseline. We found (i) that coding regions have higher pleiotropy scores than noncoding regions, (ii) that active promoters and enhancers have the highest pleiotropy scores and quiescent and heterochromatin have the lowest, (iii) that variants that control expression of more genes in more tissues have higher pleiotropy scores, and (iv) that genes associated with more than one model organism phenotype have higher pleiotropy scores
Fig. 7Replication analysis for the genome-wide pleiotropy study. We used 372 UK Biobank heritable medical traits as our discovery dataset, and independent datasets of 73 complex traits and diseases and 430 blood metabolites as replication datasets. In each case, expected fraction of replication was empirically determined using a permutation analysis