Literature DB >> 35894642

ColocQuiaL: A QTL-GWAS colocalization pipeline.

Brian Y Chen1, William P Bone2, Kim Lorenz3,4,5, Michael Levin5,6,7, Marylyn D Ritchie4,8,9, Benjamin F Voight3,4,5,8,10.   

Abstract

SUMMARY: Identifying genomic features responsible for genome-wide association study (GWAS) signals has proven to be a difficult challenge; many researchers have turned to colocalization analysis of GWAS signals with expression quantitative trait loci (eQTL) and splicing quantitative trait loci (sQTL) to connect GWAS signals to candidate causal genes. The ColocQuiaL pipeline provides a framework to perform these colocalization analyses at scale across the genome and returns summary files and locus visualization plots to allow for detailed review of the results. As an example, we used ColocQuiaL to perform colocalization between the latest type 2 diabetes GWAS data and Genotype-Tissue Expression (GTEx) v8 single-tissue eQTL and sQTL data.
AVAILABILITY AND IMPLEMENTATION: ColocQuiaL is primarily written in R and is freely available on GitHub: https://github.com/bvoightlab/ColocQuiaL.
© The Author(s) 2022. Published by Oxford University Press.

Entities:  

Year:  2022        PMID: 35894642      PMCID: PMC9477517          DOI: 10.1093/bioinformatics/btac512

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.931


1 Introduction

Genome-wide association studies (GWAS) conducted on large populations have identified a plethora of associations between genetic variation and complex traits and diseases in humans (Buniello ). From this collection of predominantly non-coding variants, a central challenge has emerged to identify which genomic features at each locus ultimately play a functional role in the phenotype of interest. This insight is a key barrier to initiate functional follow-up experiments. One source of data that can be used to link GWAS associations to a predicted effector transcript of action is by connecting them with molecular phenotype quantitative trait loci (QTLs). A well-powered source of two important types of QTLs—those associated with variation in expression of transcripts (eQTLs) and proportion of alternatively spliced transcripts (sQTLs)—was reported across >40 tissues by the Genotype-Tissue Expression (GTEx) project (Carithers ). To connect trait signals to these data and identify potential candidate genes, the community has turned to statistical colocalization—an approach designed to infer if the association signals between a complex trait and QTL are tagged by the same genetic variant(s) (Giambartolomei ). To provide a common, reproducible framework to perform colocalization analyses between QTL and complex trait data at moderate computational scale, we present here an implementation, ColocQuiaL, which allows for the rapid execution of colocalization analyses for GWAS signals from a summary statistics file with QTL signals from the eQTL or sQTL datasets of a user’s choosing. As a proof of concept, we applied it to a large catalog of lead associations and summary data for type 2 diabetes (T2D) and the GTEx v8 single-tissue eQTLs or sQTLs datasets (Mahajan ; Vujkovic ).

2 ColocQuiaL

The motivation underlying the development of ColocQuiaL was the need to perform and visualize the results from a large number (10 000+) of colocalization analyses between signals for one (or more) complex traits and the catalog of available human tissue QTL data (Fig. 1). As such, ColocQuiaL automates the execution of COLOC to perform colocalization analyses between GWAS signals for any trait of interest and single-tissue eQTL and sQTL signals (Giambartolomei ). The input loci to ColocQuiaL can be a single GWAS locus, a list of GWAS loci of interest, or the summary statistics across the entire genome (Fig. 1). Users can specify the lead SNPs and the genomic intervals of the colocalization analysis based on prior knowledge of the loci, or they can perform more general analyses by supplying the GWAS summary statistics file and their preferred definition of significant P-values and independent loci via an interface with PLINK (Purcell ). In all these scenarios, ColocQuiaL will perform a colocalization analysis between each single-tissue eQTL or sQTL signal for which a lead SNP is a significant QTL and the GWAS signal at the locus. ColocQuiaL generates output files to allow for both manual review of individual colocalization analyses and quick review of all the analyses performed (Fig. 1).
Fig. 1.

ColocQuiaL workflow. The first panel shows the possible GWAS inputs that ColocQuiaL accepts. The second panel demonstrates how ColocQuiaL performs colocalizations between the available QTL signals and the GWAS signals provided. The last panel demonstrates the regional association plots and the summary of colocalization results output that ColocQuiaL provides

ColocQuiaL workflow. The first panel shows the possible GWAS inputs that ColocQuiaL accepts. The second panel demonstrates how ColocQuiaL performs colocalizations between the available QTL signals and the GWAS signals provided. The last panel demonstrates the regional association plots and the summary of colocalization results output that ColocQuiaL provides The majority of these output files are deposited in lead SNP specific directories. The COLOC results and intermediary files for each colocalization analysis at a lead SNP will all be saved to the directory specific to the lead SNP. This directory will also include regional association plots for each QTL-tissue signal involved in a colocalization analysis and the GWAS trait signal at the locus. These regional association plots are similar to those generated by the popular tool LocusZoom, but are generated as part of the ColocQuiaL code (Fig. 1) (Pruim ). Finally, ColocQuiaL generates a summary output file that contains all of the locus level posterior probabilities for the COLOC analyses of the ColocQuiaL run (Fig. 1). The ColocQuiaL pipeline is written in R (v3.6.3 or later) and bash. It executes COLOC with its default priors and is compatible with at least COLOC versions 4 and 5. We implemented a version of ColocQuiaL that is parallelized at the lead SNP level via the LSF workload submission system and an in-series version that can be modified for other job submission systems. ColocQuiaL also interfaces with the following standard bioinformatic tools: PLINK (v 1.90Beta45), bedtools (v2.29.1) and Tabix (0.2.5) (Li, 2011; Purcell ; Quinlan and Hall, 2010). In order to run the pipeline, the user will need to configure a small number of dependency files from the summary statistics of the QTL dataset they wish to use for colocalization analyses. Detailed instructions on how to download and configure the dependency files for GTEx v8 single-tissue files from the GTEx Portal as well as eQTL Catalogue data from their website are available at https://github.com/bvoightlab/ColocQuiaL. These procedures should also apply to any other eQTL or sQTL dataset for which summary statistics are available.

3 Usage scenario

As a use case, we used ColocQuiaL to perform colocalization analysis of all reported independent T2D genome-wide significant signals reported in Mahajan with GTEx single-tissue eQTLs and sQTLs using the Vujkovic T2D summary statistics (Mahajan ; Vujkovic ). We used the list of 520 genome-wide significant (P-value ≤ 5 × 10−8) lead SNPs reported in Mahajan as the GWAS loci input for ColocQuiaL, and used the GTEx v8 significant eQTL/sQTL files as the reference for significant QTLs. For this analysis, we considered a conditional posterior probability of colocalization of 0.8 or greater to be evidence of colocalization between the T2D signal and the QTL signal. The conditional posterior probability of colocalization is the posterior probability of there being two significant signals at a locus that colocalize (PP4) divided by the sum of the PP4 and the posterior probability that there are two significant signals at the locus that do not colocalize (PP3). We chose to use this metric to assess colocalization since all GWAS and QTL signals in this analysis have been defined as significant in the Mahajan or GTEx analyses and the posterior probability of the other COLOC hypotheses should be negligible. Across the 520 T2D lead SNPs, we found 278 colocalized (PP4/(PP3 + PP4) ≥ 0.8) with one or more eQTL signals and 148 colocalized with one or more sQTL signals. These colocalizing signals represent 766 genes and 47 tissues among the eQTLs and 268 genes and 48 tissues among the sQTLs. In total, we performed 9563 colocalizations between T2D signals and eQTL signals and 38 994 between T2D signals and sQTL signals. We performed this on a PowerEdge R630 Server (2.2Ghz Xeon E5-2699 v4 Dual 22-Core, 512 Gb memory) using the lead SNP parallelized version of ColocQuiaL. The median run time and median maximum memory usage for each lead SNP job were 10 min 1 s and 17.66 GBs for the eQTLs and 7 min 49 s and 16.56 GBs for sQTLs. Both eQTLs and sQTLs had a small number of outlier lead SNPs that were significant for a much larger number of eQTL/sQTL signals in GTEx than the average lead SNP, with the maximum number of colocalizations required being 343 for an eQTL lead SNP and 2561 for an sQTL lead SNP. Our results show these T2D GWAS signals colocalize with QTL signals for many of the genes one would expect and replicate recent T2D colocalization studies. We found three maturity-onset diabetes of the young (MODY) gene QTLs colocalized with T2D signals. One MODY gene, KCNJ11, had both an eQTL and an sQTL signal that colocalized with T2D signals (Naylor ). We also compared our findings to a predicted causal genes list for T2D (from the T2D knowledge portal) and found that T2D signals colocalized with eQTL or sQTL signals for 22 out of the 58 genes. Finally, we compared our results to the recently published T2D QTL colocalization result from Gloudemans —colocalization of T2D and insulin resistance GWAS data with eQTLs and sQTLs from a subset of GTEx tissues—and Alonso —colocalization of T2D GWAS data with islets of Langerhans eQTLs (Alonso ; Gloudemans ). We found that our results replicate 24 of 46 genes from Gloudemans , including PLEKHA1, AP3S2, HMG20A, and 16 of the 31 genes from Alonso , including HMBS, PCBD1, and USP36 (Alonso ; Gloudemans ).

4 Discussion

There are a number of ways ColocQuiaL could be used for colocalization analyses that we have not explicitly discussed here. One that we would like to point out to users interested in multi-trait GWAS is that a user can simply run the pipeline once for each trait in a multi-trait analysis in order to assess the evidence that the traits share a causal QTL variant at a locus. There are also a number of other features we plan to add to the ColocQuiaL software over time, including compatibility with other QTL data types and the use of other colocalization methods, such as COLOC-SuSiE to account for loci with multiple causal variants and HyPrColoc to allow for rapid colocalization of three or more traits at a locus (Foley ; Wallace, 2021). In summary, ColocQuiaL provides a scalable framework to perform colocalization analyses across the genome between an arbitrary GWAS of interest and any eQTL/sQTL datasets for which a user has summary statistics available. It returns user-friendly summary files and regional association plots for reviewing of the results, allowing users to efficiently generate causal gene and tissue hypotheses for their GWAS results.

Funding

This work was supported by the American Heart Association [20PRE35120109 to W.P.B.] and National Institutes of Health [DK101478 and DK126194 to B.F.V.]. Conflict of Interest: M.D.R. is on the scientific advisory board for Goldfinch Bio and Cipherome. The remaining authors declare no conflicts of interest.
  12 in total

1.  Tabix: fast retrieval of sequence features from generic TAB-delimited files.

Authors:  Heng Li
Journal:  Bioinformatics       Date:  2011-01-05       Impact factor: 6.937

2.  PLINK: a tool set for whole-genome association and population-based linkage analyses.

Authors:  Shaun Purcell; Benjamin Neale; Kathe Todd-Brown; Lori Thomas; Manuel A R Ferreira; David Bender; Julian Maller; Pamela Sklar; Paul I W de Bakker; Mark J Daly; Pak C Sham
Journal:  Am J Hum Genet       Date:  2007-07-25       Impact factor: 11.025

3.  LocusZoom: regional visualization of genome-wide association scan results.

Authors:  Randall J Pruim; Ryan P Welch; Serena Sanna; Tanya M Teslovich; Peter S Chines; Terry P Gliedt; Michael Boehnke; Gonçalo R Abecasis; Cristen J Willer
Journal:  Bioinformatics       Date:  2010-07-15       Impact factor: 6.937

4.  A Novel Approach to High-Quality Postmortem Tissue Procurement: The GTEx Project.

Authors:  Latarsha J Carithers; Kristin Ardlie; Mary Barcus; Philip A Branton; Angela Britton; Stephen A Buia; Carolyn C Compton; David S DeLuca; Joanne Peter-Demchok; Ellen T Gelfand; Ping Guan; Greg E Korzeniewski; Nicole C Lockhart; Chana A Rabiner; Abhi K Rao; Karna L Robinson; Nancy V Roche; Sherilyn J Sawyer; Ayellet V Segrè; Charles E Shive; Anna M Smith; Leslie H Sobin; Anita H Undale; Kimberly M Valentino; Jim Vaught; Taylor R Young; Helen M Moore
Journal:  Biopreserv Biobank       Date:  2015-10       Impact factor: 2.300

5.  The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019.

Authors:  Annalisa Buniello; Jacqueline A L MacArthur; Maria Cerezo; Laura W Harris; James Hayhurst; Cinzia Malangone; Aoife McMahon; Joannella Morales; Edward Mountjoy; Elliot Sollis; Daniel Suveges; Olga Vrousgou; Patricia L Whetzel; Ridwan Amode; Jose A Guillen; Harpreet S Riat; Stephen J Trevanion; Peggy Hall; Heather Junkins; Paul Flicek; Tony Burdett; Lucia A Hindorff; Fiona Cunningham; Helen Parkinson
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

6.  A fast and efficient colocalization algorithm for identifying shared genetic risk factors across multiple traits.

Authors:  Christopher N Foley; James R Staley; Philip G Breen; Benjamin B Sun; Paul D W Kirk; Stephen Burgess; Joanna M M Howson
Journal:  Nat Commun       Date:  2021-02-03       Impact factor: 14.919

7.  TIGER: The gene expression regulatory variation landscape of human pancreatic islets.

Authors:  Lorena Alonso; Anthony Piron; Ignasi Morán; Marta Guindo-Martínez; Sílvia Bonàs-Guarch; Goutham Atla; Irene Miguel-Escalada; Romina Royo; Montserrat Puiggròs; Xavier Garcia-Hurtado; Mara Suleiman; Lorella Marselli; Jonathan L S Esguerra; Jean-Valéry Turatsinze; Jason M Torres; Vibe Nylander; Ji Chen; Lena Eliasson; Matthieu Defrance; Ramon Amela; Hindrik Mulder; Anna L Gloyn; Leif Groop; Piero Marchetti; Decio L Eizirik; Jorge Ferrer; Josep M Mercader; Miriam Cnop; David Torrents
Journal:  Cell Rep       Date:  2021-10-12       Impact factor: 9.423

8.  Bayesian test for colocalisation between pairs of genetic association studies using summary statistics.

Authors:  Claudia Giambartolomei; Damjan Vukcevic; Eric E Schadt; Lude Franke; Aroon D Hingorani; Chris Wallace; Vincent Plagnol
Journal:  PLoS Genet       Date:  2014-05-15       Impact factor: 5.917

9.  Discovery of 318 new risk loci for type 2 diabetes and related vascular outcomes among 1.4 million participants in a multi-ancestry meta-analysis.

Authors:  Marijana Vujkovic; Jacob M Keaton; Kyong-Mi Chang; Benjamin F Voight; Danish Saleheen; Julie A Lynch; Donald R Miller; Jin Zhou; Catherine Tcheandjieu; Jennifer E Huffman; Themistocles L Assimes; Kimberly Lorenz; Xiang Zhu; Austin T Hilliard; Renae L Judy; Jie Huang; Kyung M Lee; Derek Klarin; Saiju Pyarajan; John Danesh; Olle Melander; Asif Rasheed; Nadeem H Mallick; Shahid Hameed; Irshad H Qureshi; Muhammad Naeem Afzal; Uzma Malik; Anjum Jalal; Shahid Abbas; Xin Sheng; Long Gao; Klaus H Kaestner; Katalin Susztak; Yan V Sun; Scott L DuVall; Kelly Cho; Jennifer S Lee; J Michael Gaziano; Lawrence S Phillips; James B Meigs; Peter D Reaven; Peter W Wilson; Todd L Edwards; Daniel J Rader; Scott M Damrauer; Christopher J O'Donnell; Philip S Tsao
Journal:  Nat Genet       Date:  2020-06-15       Impact factor: 38.330

10.  A more accurate method for colocalisation analysis allowing for multiple causal variants.

Authors:  Chris Wallace
Journal:  PLoS Genet       Date:  2021-09-29       Impact factor: 5.917

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.