Literature DB >> 24813542

SNPsea: an algorithm to identify cell types, tissues and pathways affected by risk loci.

Kamil Slowikowski1, Xinli Hu2, Soumya Raychaudhuri1.   

Abstract

UNLABELLED: We created a fast, robust and general C+ + implementation of a single-nucleotide polymorphism (SNP) set enrichment algorithm to identify cell types, tissues and pathways affected by risk loci. It tests trait-associated genomic loci for enrichment of specificity to conditions (cell types, tissues and pathways). We use a non-parametric statistical approach to compute empirical P-values by comparison with null SNP sets. As a proof of concept, we present novel applications of our method to four sets of genome-wide significant SNPs associated with red blood cell count, multiple sclerosis, celiac disease and HDL cholesterol.
AVAILABILITY AND IMPLEMENTATION: http://broadinstitute.org/mpg/snpsea. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author 2014. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2014        PMID: 24813542      PMCID: PMC4147889          DOI: 10.1093/bioinformatics/btu326

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

As genome-wide association studies (GWAS) continue to find disease alleles, investigators seek to identify the set of pathways and tissue types affected by these alleles, and the physiological conditions under which they act (Elbers ; Lango Allen ; Raychaudhuri, 2011; Wang ; Yaspan and Veatch, 2011). For example, we have previously presented statistical methods to identify immune cell types for further functional investigation by finding cell type-specific expression of genes in linkage disequilibrium (LD) with autoimmune disease-associated single-nucleotide polymorphisms (SNPs) (Hu ). Presumably, alleles influence disease risk through pathways specific to these cell types. We sought a general implementation of these methods to leverage data from high-throughput functional assays that assess genome-wide transcription, protein binding, epigenetic modifications and other functional parameters across diverse cellular conditions and tissue types. Each of these diverse data types can be represented as a continuous matrix of genes and conditions (e.g. cell types, tissues, pathways, experimental conditions). Databases such as Gene Ontology (GO) (Botstein ) offer expert-defined pathways and complementary gene annotations that can be represented as binary values. Investigators have already described strategies to assess enrichment of GWA results for pathways or gene sets but not for condition specificity (Holden ; Weng ). In contrast to these methods, we do not require genotypes, P-values, a priori gene sets or pathways or a priori definitions of gene–SNP associations. We require only a list of SNP identifiers, use LD structures to identify plausibly influential genes and use a simple sampling approach to identify the conditions they influence. SNPsea is a general algorithm to identify the conditions relevant to a trait by assessing the genes within associated loci for enrichment of condition specificity.

2 METHODS

For a given set of SNPs, SNPsea tests genes implicated by LD, in aggregate, for enrichment of specificity to a condition in a given matrix of genes and conditions. The matrix must be normalized so that conditions are comparable. First, we identify genes implicated by each SNP using LD from reference genomes. Second, we calculate a specificity score for each condition with these genes. Finally, we compare these scores with scores obtained with null sets of matched SNP sets to calculate an empirical P-value for each condition (see Supplementary Notes for algorithm details). We empirically calculate P-values because we previously found that analytical distributions can result in inaccurate P-values (Hu ). SNP linkage intervals, gene densities, gene sizes and gene functions are correlated across the genome and are challenging to model analytically. We used C++ for fast computation of P-values because Python was prohibitively slow. The online reference manual details compilation and installation procedures; we also provide executable files for immediate use on select platforms.

2.1 Multiple genes implicated by LD

Accurate analyses must address the critical issue that SNPs from GWA studies frequently implicate more than one gene (50% of GWAS Catalog SNPs, Supplementary Fig. S2). We defined LD intervals with SNPs from the 1000 Genomes Project (EUR) (Genomes Project Consortium, 2010) and a previously described strategy (Supplementary Fig. S1) (Rossin ). A SNP implicates genes overlapping its LD interval, defined by the furthest SNPs in a 1 Mb window with r2 > 0.5. To ensure the associated genes are included, we extend each interval to the nearest recombination hotspots with recombination rate >3 cM/Mb (HapMap3) (Myers ). We merge SNPs with shared genes into a single locus. By default, we assume that each associated locus harbours a single influential gene rather than multiple genes. We provide an alternative scoring method to account for multiple genes (Supplementary Notes) that produces similar results in four traits we tested (Supplementary Fig. S4). Because interval lengths depend on the choice of r2 threshold, we looked for an effect of this choice (Supplementary Fig. S3). The significant result for the Gene Atlas and blood cell count SNPs is robust to different thresholds. Similarly, the choice of r2 threshold has little effect on the significant GO enrichment result for these SNPs.

2.2 Type I error estimates

We tested 10 000 sets of 100 randomly selected LD-pruned SNPs. For each condition (tissue or GO term), we observed appropriate proportions of P-values <0.5, 0.1, 0.05, 0.01 and 0.005 (Supplementary Fig. S5).

3 EXAMPLES

We used SNPsea to identify tissues relevant to blood cell count by testing 45 genome-wide significant SNPs (van der Harst ) with expression data (Gene Atlas) for 17 581 genes across 79 human tissues (Su ). Bone marrow CD71+ early erythroid cells are significantly enriched for cell type-specific expression of the genes within the trait-associated loci (P = 2 × 10−7) (Fig. 1).
Fig. 1.

Empirical P-values for specificity to each condition. 25 of 79 tissues (Gene Atlas) are shown. Adjacent: Pearson correlation coefficients for pairs of expression profiles ordered by hierarchical clustering with UPGMA

Empirical P-values for specificity to each condition. 25 of 79 tissues (Gene Atlas) are shown. Adjacent: Pearson correlation coefficients for pairs of expression profiles ordered by hierarchical clustering with UPGMA The genes in these loci are enriched for the term hemopoiesis (GO:0030097) (P = 2 × 10−5) (Supplementary Fig. S6), suggesting that blood cell count may be influenced by the genes expressed specifically in early erythroid cells and involved in forming blood cellular components. We provide additional examples for SNPs associated with multiple sclerosis, celiac disease and HDL cholesterol. Each includes Gene Atlas and GO enrichments, r2 comparisons and comparisons of results assuming a single or multiple causal genes (Supplementary Figs S7–9). Funding: The National Institutes of Health (5K08AR055688, 1R01AR062886-01, 1U01HG0070033, T32 HG002295/HG/NHGRI, 7T32HG002295-10), the Arthritis Foundation and the Doris Duke Foundation. Conflict of Interest: none declared.
  14 in total

1.  Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors:  M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal:  Nat Genet       Date:  2000-05       Impact factor: 38.330

2.  Mapping rare and common causal alleles for complex human diseases.

Authors:  Soumya Raychaudhuri
Journal:  Cell       Date:  2011-09-30       Impact factor: 41.582

3.  A fine-scale map of recombination rates and hotspots across the human genome.

Authors:  Simon Myers; Leonardo Bottolo; Colin Freeman; Gil McVean; Peter Donnelly
Journal:  Science       Date:  2005-10-14       Impact factor: 47.728

4.  Association of polymorphisms in the Chr18q11.2 locus with tuberculosis in Chinese population.

Authors:  Xingyan Wang; Nelson Leung-Sang Tang; Chi Chiu Leung; Kai Man Kam; Wing Wai Yew; Cheuk Ming Tam; Chiu Yeung Chan
Journal:  Hum Genet       Date:  2013-03-03       Impact factor: 4.132

5.  Using genome-wide pathway analysis to unravel the etiology of complex diseases.

Authors:  Clara C Elbers; Kristel R van Eijk; Lude Franke; Flip Mulder; Yvonne T van der Schouw; Cisca Wijmenga; N Charlotte Onland-Moret
Journal:  Genet Epidemiol       Date:  2009-07       Impact factor: 2.135

6.  GSEA-SNP: applying gene set enrichment analysis to SNP data from genome-wide association studies.

Authors:  Marit Holden; Shiwei Deng; Leszek Wojnowski; Bettina Kulle
Journal:  Bioinformatics       Date:  2008-10-14       Impact factor: 6.937

7.  A gene atlas of the mouse and human protein-encoding transcriptomes.

Authors:  Andrew I Su; Tim Wiltshire; Serge Batalov; Hilmar Lapp; Keith A Ching; David Block; Jie Zhang; Richard Soden; Mimi Hayakawa; Gabriel Kreiman; Michael P Cooke; John R Walker; John B Hogenesch
Journal:  Proc Natl Acad Sci U S A       Date:  2004-04-09       Impact factor: 11.205

8.  Proteins encoded in genomic regions associated with immune-mediated disease physically interact and suggest underlying biology.

Authors:  Elizabeth J Rossin; Kasper Lage; Soumya Raychaudhuri; Ramnik J Xavier; Diana Tatar; Yair Benita; Chris Cotsapas; Mark J Daly
Journal:  PLoS Genet       Date:  2011-01-13       Impact factor: 5.917

9.  Hundreds of variants clustered in genomic loci and biological pathways affect human height.

Authors:  Hana Lango Allen; Karol Estrada; Guillaume Lettre; Sonja I Berndt; Michael N Weedon; Fernando Rivadeneira; Cristen J Willer; Anne U Jackson; Sailaja Vedantam; Soumya Raychaudhuri; Teresa Ferreira; Andrew R Wood; Robert J Weyant; Ayellet V Segrè; Elizabeth K Speliotes; Eleanor Wheeler; Nicole Soranzo; Ju-Hyun Park; Jian Yang; Daniel Gudbjartsson; Nancy L Heard-Costa; Joshua C Randall; Lu Qi; Albert Vernon Smith; Reedik Mägi; Tomi Pastinen; Liming Liang; Iris M Heid; Jian'an Luan; Gudmar Thorleifsson; Thomas W Winkler; Michael E Goddard; Ken Sin Lo; Cameron Palmer; Tsegaselassie Workalemahu; Yurii S Aulchenko; Asa Johansson; M Carola Zillikens; Mary F Feitosa; Tõnu Esko; Toby Johnson; Shamika Ketkar; Peter Kraft; Massimo Mangino; Inga Prokopenko; Devin Absher; Eva Albrecht; Florian Ernst; Nicole L Glazer; Caroline Hayward; Jouke-Jan Hottenga; Kevin B Jacobs; Joshua W Knowles; Zoltán Kutalik; Keri L Monda; Ozren Polasek; Michael Preuss; Nigel W Rayner; Neil R Robertson; Valgerdur Steinthorsdottir; Jonathan P Tyrer; Benjamin F Voight; Fredrik Wiklund; Jianfeng Xu; Jing Hua Zhao; Dale R Nyholt; Niina Pellikka; Markus Perola; John R B Perry; Ida Surakka; Mari-Liis Tammesoo; Elizabeth L Altmaier; Najaf Amin; Thor Aspelund; Tushar Bhangale; Gabrielle Boucher; Daniel I Chasman; Constance Chen; Lachlan Coin; Matthew N Cooper; Anna L Dixon; Quince Gibson; Elin Grundberg; Ke Hao; M Juhani Junttila; Lee M Kaplan; Johannes Kettunen; Inke R König; Tony Kwan; Robert W Lawrence; Douglas F Levinson; Mattias Lorentzon; Barbara McKnight; Andrew P Morris; Martina Müller; Julius Suh Ngwa; Shaun Purcell; Suzanne Rafelt; Rany M Salem; Erika Salvi; Serena Sanna; Jianxin Shi; Ulla Sovio; John R Thompson; Michael C Turchin; Liesbeth Vandenput; Dominique J Verlaan; Veronique Vitart; Charles C White; Andreas Ziegler; Peter Almgren; Anthony J Balmforth; Harry Campbell; Lorena Citterio; Alessandro De Grandi; Anna Dominiczak; Jubao Duan; Paul Elliott; Roberto Elosua; Johan G Eriksson; Nelson B Freimer; Eco J C Geus; Nicola Glorioso; Shen Haiqing; Anna-Liisa Hartikainen; Aki S Havulinna; Andrew A Hicks; Jennie Hui; Wilmar Igl; Thomas Illig; Antti Jula; Eero Kajantie; Tuomas O Kilpeläinen; Markku Koiranen; Ivana Kolcic; Seppo Koskinen; Peter Kovacs; Jaana Laitinen; Jianjun Liu; Marja-Liisa Lokki; Ana Marusic; Andrea Maschio; Thomas Meitinger; Antonella Mulas; Guillaume Paré; Alex N Parker; John F Peden; Astrid Petersmann; Irene Pichler; Kirsi H Pietiläinen; Anneli Pouta; Martin Ridderstråle; Jerome I Rotter; Jennifer G Sambrook; Alan R Sanders; Carsten Oliver Schmidt; Juha Sinisalo; Jan H Smit; Heather M Stringham; G Bragi Walters; Elisabeth Widen; Sarah H Wild; Gonneke Willemsen; Laura Zagato; Lina Zgaga; Paavo Zitting; Helene Alavere; Martin Farrall; Wendy L McArdle; Mari Nelis; Marjolein J Peters; Samuli Ripatti; Joyce B J van Meurs; Katja K Aben; Kristin G Ardlie; Jacques S Beckmann; John P Beilby; Richard N Bergman; Sven Bergmann; Francis S Collins; Daniele Cusi; Martin den Heijer; Gudny Eiriksdottir; Pablo V Gejman; Alistair S Hall; Anders Hamsten; Heikki V Huikuri; Carlos Iribarren; Mika Kähönen; Jaakko Kaprio; Sekar Kathiresan; Lambertus Kiemeney; Thomas Kocher; Lenore J Launer; Terho Lehtimäki; Olle Melander; Tom H Mosley; Arthur W Musk; Markku S Nieminen; Christopher J O'Donnell; Claes Ohlsson; Ben Oostra; Lyle J Palmer; Olli Raitakari; Paul M Ridker; John D Rioux; Aila Rissanen; Carlo Rivolta; Heribert Schunkert; Alan R Shuldiner; David S Siscovick; Michael Stumvoll; Anke Tönjes; Jaakko Tuomilehto; Gert-Jan van Ommen; Jorma Viikari; Andrew C Heath; Nicholas G Martin; Grant W Montgomery; Michael A Province; Manfred Kayser; Alice M Arnold; Larry D Atwood; Eric Boerwinkle; Stephen J Chanock; Panos Deloukas; Christian Gieger; Henrik Grönberg; Per Hall; Andrew T Hattersley; Christian Hengstenberg; Wolfgang Hoffman; G Mark Lathrop; Veikko Salomaa; Stefan Schreiber; Manuela Uda; Dawn Waterworth; Alan F Wright; Themistocles L Assimes; Inês Barroso; Albert Hofman; Karen L Mohlke; Dorret I Boomsma; Mark J Caulfield; L Adrienne Cupples; Jeanette Erdmann; Caroline S Fox; Vilmundur Gudnason; Ulf Gyllensten; Tamara B Harris; Richard B Hayes; Marjo-Riitta Jarvelin; Vincent Mooser; Patricia B Munroe; Willem H Ouwehand; Brenda W Penninx; Peter P Pramstaller; Thomas Quertermous; Igor Rudan; Nilesh J Samani; Timothy D Spector; Henry Völzke; Hugh Watkins; James F Wilson; Leif C Groop; Talin Haritunians; Frank B Hu; Robert C Kaplan; Andres Metspalu; Kari E North; David Schlessinger; Nicholas J Wareham; David J Hunter; Jeffrey R O'Connell; David P Strachan; H-Erich Wichmann; Ingrid B Borecki; Cornelia M van Duijn; Eric E Schadt; Unnur Thorsteinsdottir; Leena Peltonen; André G Uitterlinden; Peter M Visscher; Nilanjan Chatterjee; Ruth J F Loos; Michael Boehnke; Mark I McCarthy; Erik Ingelsson; Cecilia M Lindgren; Gonçalo R Abecasis; Kari Stefansson; Timothy M Frayling; Joel N Hirschhorn
Journal:  Nature       Date:  2010-09-29       Impact factor: 49.962

10.  SNP-based pathway enrichment analysis for genome-wide association studies.

Authors:  Lingjie Weng; Fabio Macciardi; Aravind Subramanian; Guia Guffanti; Steven G Potkin; Zhaoxia Yu; Xiaohui Xie
Journal:  BMC Bioinformatics       Date:  2011-04-15       Impact factor: 3.169

View more
  34 in total

1.  Identification of cell types, tissues and pathways affected by risk loci in psoriasis.

Authors:  Yan Lin; Pan Zhao; Changbing Shen; Songke Shen; Xiaodong Zheng; Xianbo Zuo; Sen Yang; Xuejun Zhang; Xianyong Yin
Journal:  Mol Genet Genomics       Date:  2015-11-12       Impact factor: 3.291

2.  Epigenetic and gene expression analysis of ankylosing spondylitis-associated loci implicate immune cells and the gut in the disease pathogenesis.

Authors:  Z Li; K Haynes; D J Pennisi; L K Anderson; X Song; G P Thomas; T Kenna; P Leo; M A Brown
Journal:  Genes Immun       Date:  2017-06-15       Impact factor: 2.676

3.  Inferring Relevant Cell Types for Complex Traits by Using Single-Cell Gene Expression.

Authors:  Diego Calderon; Anand Bhaskar; David A Knowles; David Golan; Towfique Raj; Audrey Q Fu; Jonathan K Pritchard
Journal:  Am J Hum Genet       Date:  2017-10-26       Impact factor: 11.025

4.  Multiple sclerosis genomic map implicates peripheral immune cells and microglia in susceptibility.

Authors: 
Journal:  Science       Date:  2019-09-27       Impact factor: 47.728

5.  Confirmation of five novel susceptibility loci for systemic lupus erythematosus (SLE) and integrated network analysis of 82 SLE susceptibility loci.

Authors:  Julio E Molineros; Wanling Yang; Xu-Jie Zhou; Celi Sun; Yukinori Okada; Huoru Zhang; Kek Heng Chua; Yu-Lung Lau; Yuta Kochi; Akari Suzuki; Kazuhiko Yamamoto; Jianyang Ma; So-Young Bang; Hye-Soon Lee; Kwangwoo Kim; Sang-Cheol Bae; Hong Zhang; Nan Shen; Loren L Looger; Swapan K Nath
Journal:  Hum Mol Genet       Date:  2017-03-15       Impact factor: 6.150

Review 6.  Genetics of Multiple Sclerosis: An Overview and New Directions.

Authors:  Nikolaos A Patsopoulos
Journal:  Cold Spring Harb Perspect Med       Date:  2018-07-02       Impact factor: 6.915

7.  Genetic loci associated with chronic obstructive pulmonary disease overlap with loci for lung function and pulmonary fibrosis.

Authors:  Brian D Hobbs; Kim de Jong; Maxime Lamontagne; Yohan Bossé; Nick Shrine; María Soler Artigas; Louise V Wain; Ian P Hall; Victoria E Jackson; Annah B Wyss; Stephanie J London; Kari E North; Nora Franceschini; David P Strachan; Terri H Beaty; John E Hokanson; James D Crapo; Peter J Castaldi; Robert P Chase; Traci M Bartz; Susan R Heckbert; Bruce M Psaty; Sina A Gharib; Pieter Zanen; Jan W Lammers; Matthijs Oudkerk; H J Groen; Nicholas Locantore; Ruth Tal-Singer; Stephen I Rennard; Jørgen Vestbo; Wim Timens; Peter D Paré; Jeanne C Latourelle; Josée Dupuis; George T O'Connor; Jemma B Wilk; Woo Jin Kim; Mi Kyeong Lee; Yeon-Mok Oh; Judith M Vonk; Harry J de Koning; Shuguang Leng; Steven A Belinsky; Yohannes Tesfaigzi; Ani Manichaikul; Xin-Qun Wang; Stephen S Rich; R Graham Barr; David Sparrow; Augusto A Litonjua; Per Bakke; Amund Gulsvik; Lies Lahousse; Guy G Brusselle; Bruno H Stricker; André G Uitterlinden; Elizabeth J Ampleford; Eugene R Bleecker; Prescott G Woodruff; Deborah A Meyers; Dandi Qiao; David A Lomas; Jae-Joon Yim; Deog Kyeom Kim; Iwona Hawrylkiewicz; Pawel Sliwinski; Megan Hardin; Tasha E Fingerlin; David A Schwartz; Dirkje S Postma; William MacNee; Martin D Tobin; Edwin K Silverman; H Marike Boezen; Michael H Cho
Journal:  Nat Genet       Date:  2017-02-06       Impact factor: 38.330

Review 8.  In The Blood: Connecting Variant to Function In Human Hematopoiesis.

Authors:  Satish K Nandakumar; Xiaotian Liao; Vijay G Sankaran
Journal:  Trends Genet       Date:  2020-06-10       Impact factor: 11.639

9.  Colonic epithelial cell diversity in health and inflammatory bowel disease.

Authors:  Kaushal Parikh; Agne Antanaviciute; David Fawkner-Corbett; Marta Jagielowicz; Anna Aulicino; Christoffer Lagerholm; Simon Davis; James Kinchen; Hannah H Chen; Nasullah Khalid Alham; Neil Ashley; Errin Johnson; Philip Hublitz; Leyuan Bao; Joanna Lukomska; Rajinder Singh Andev; Elisabet Björklund; Benedikt M Kessler; Roman Fischer; Robert Goldin; Hashem Koohy; Alison Simmons
Journal:  Nature       Date:  2019-02-27       Impact factor: 49.962

10.  Single-cell atlas of colonic CD8+ T cells in ulcerative colitis.

Authors:  Daniele Corridoni; Agne Antanaviciute; Tarun Gupta; David Fawkner-Corbett; Anna Aulicino; Marta Jagielowicz; Kaushal Parikh; Emmanouela Repapi; Steve Taylor; Dai Ishikawa; Ryo Hatano; Taketo Yamada; Wei Xin; Hubert Slawinski; Rory Bowden; Giorgio Napolitani; Oliver Brain; Chikao Morimoto; Hashem Koohy; Alison Simmons
Journal:  Nat Med       Date:  2020-08-03       Impact factor: 53.440

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.