Literature DB >> 20702402

Genevar: a database and Java application for the analysis and visualization of SNP-gene associations in eQTL studies.

Tsun-Po Yang1, Claude Beazley, Stephen B Montgomery, Antigone S Dimas, Maria Gutierrez-Arcelus, Barbara E Stranger, Panos Deloukas, Emmanouil T Dermitzakis.   

Abstract

UNLABELLED: Genevar (GENe Expression VARiation) is a database and Java tool designed to integrate multiple datasets, and provides analysis and visualization of associations between sequence variation and gene expression. Genevar allows researchers to investigate expression quantitative trait loci (eQTL) associations within a gene locus of interest in real time. The database and application can be installed on a standard computer in database mode and, in addition, on a server to share discoveries among affiliations or the broader community over the Internet via web services protocols. AVAILABILITY: http://www.sanger.ac.uk/resources/software/genevar.

Entities:  

Mesh:

Year:  2010        PMID: 20702402      PMCID: PMC2944204          DOI: 10.1093/bioinformatics/btq452

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

Expression quantitative trait loci (eQTL) mapping, where gene expression profiling is treated as a phenotypic trait in genome-wide association studies (GWAS), has successfully been employed to uncover genetic variants that influence expression variation in recent studies (Dixon et al., 2007; Stranger et al., 2007a). Single-nucleotide polymorphism (SNP)–gene associations from eQTL analysis can be investigated in populations (Stranger et al., 2007b) or among tissue types (Dimas et al., 2009; Heinzen et al., 2008). In addition to genome-wide eQTL identification, combinations of eQTLs and lead SNPs identified by GWAS have been provided to interrogate the mechanisms underlying disease susceptibility at specific loci (Grundberg et al., 2009; Nica et al., 2010; Zeller et al., 2010). However, an analytical and visualization tool, together with a structured repository for multiple datasets, is still needed to facilitate the investigation of loci of interest and to share data publicly and among collaborators. Here, we present Genevar, a database and Java tool designed to provide: (i) data warehousing; (ii) real-time computation of correlation significance; (iii) visualization of mapping results in a user-friendly interface; and (iv) an added web services platform that is implemented as a bridge between the server and multiple users. Genevar allows published data to be visually accessible in a secure fashion, without the need for users to download raw data. Through interactive analysis pipelines, researchers are able to rapidly investigate, for instance, cis-acting eQTLs at the locus of interest. Complementing already available standalone tools (Chen et al., 2009; Ge et al., 2008), a database-centric architecture enables Genevar to perform complex queries on-the-fly and does not have a high memory requirement for prior reading in large-scale datasets. Furthermore, exploiting the convenience of web-based (Wang et al., 2003; Zou et al., 2007) and web-launch (Mueller et al., 2005) tools, a Java interface was developed that connects to both database and web services. The main advantage of this system design is that users can switch between public services and local data on the same interface. Default services at the Sanger Institute currently contain gene expression profiling and genotypic data from the following two datasets: lymphoblastoid cell lines from eight HapMap3 populations (824 individuals, unpublished data); and three cell types derived from umbilical cords of 75 Geneva GenCord individuals (Dimas et al., 2009).

2 FEATURES

Genevar has two main functionalities in cis-eQTL analysis: (i) identifying eQTLs in genes of interest, and (ii) observing SNP–gene associations surrounding SNPs of interest (Fig. 1). Additional features include SNP–probe association plots and external links to three major genome browsers. Either cis- or trans-eQTLs can be plotted in the SNP–probe association plot module. Mapping results are listed in tree nodes in a structural manner, and information can be saved as PNG diagrams or exported as tab-delimited lists for further use in presentations or publications.
Fig. 1.

Results of Genevar: a scatter plot represents observed eQTLs in a 2 Mb window centering the GBP3 locus in HapMap3 CHB (A), and a line chart illustrates observed SNP–gene associations in a 2 Mb region surrounding rs13277113 SNP in eight HapMap3 populations (B).

Results of Genevar: a scatter plot represents observed eQTLs in a 2 Mb window centering the GBP3 locus in HapMap3 CHB (A), and a line chart illustrates observed SNP–gene associations in a 2 Mb region surrounding rs13277113 SNP in eight HapMap3 populations (B). Genevar is compatible with PLINK (Purcell et al., 2007) genotype data formats and any tab-delimited expression/genotyping file in our format. After uploading datasets onto the database, Genevar presents expression profiling data and individual genotypes in two cataloged management panels. Once a group of datasets is selected in the follow-up analysis pipelines, the software automatically prompts available expression–genotype pairs for the user to choose from. Spearman's rank correlation coefficient is performed to estimate the strength of relationship between alleles and gene expression intensities, linear regression is also used to model the relationship between the two variables. To test the significance of the relationship, a t-statistic is employed with n − 2 degrees of freedom for both correlation and regression analysis (Stranger et al., 2007b). The software allows the user to adjust the window size centering on the gene/SNP of interest (e.g. 2 Mb) and user-defined P-value threshold (e.g. P < 0.001) for the featured cis-eQTL analysis. Alternatively, non-parametric permutation P-values are also provided in the subsequent association plot module to further evaluate the significance of nominal P-values. In order to construct a distribution of the test statistic, under the null hypothesis of no SNP–probe associations, expression intensities are randomly re-assigned to individuals' genotypes, then correlation coefficient and statistical significance are re-computed for the relabeled traits, and this procedure is repeated 10 000 times (Stranger et al., 2005). We recommend users to launch Genevar via Java Web Start from our homepage for the most up-to-date version. After launching, Genevar is initially in web services mode connecting to the Sanger Institute. The user can then make another services connection to affiliated institutes, or switch to database mode connecting directly to user's local database. Genevar can be run completely offline in database mode as there is no communication between the Java interface and Sanger server. Future work will include modified visualization for displaying next-generation sequence data, e.g. RNA-Seq (Montgomery et al., 2010); and implementation of methylation modules to interrogate epigenomic data.

3 IMPLEMENTATION

This approach to relational database design is an attempt to systematically decompose traditional flat files, which are one record per line and have no structural relationships between the records, into grouped dimension tables and to reduce data redundancy. A normalized and structured repository is suitable to warehouse all kinds of data format regardless of the file size and field numbers. Most importantly, the advantage of using database indexing on expression and genotype fact tables highly stabilize retrieval performance with the subsequent but reasonable cost of slower uploads and increased disk space. The only limitation when the datasets grew would be the storage space as this is a trade-off for query speed. To maximize the potential of Genevar as a platform shared among affiliations, Genevar has been extended to interact with web services protocols to enhance data security; the database schema will be deployed behind and protected by the firewall, whereas only a secure frontend webpage acting as a middle layer will be accessible to the user over the Internet. Genevar uses Hibernate library (http://www.hibernate.org) to map object-oriented models onto MySQL relational database tables (http://www.mysql.com) in the back-end, and acquires Apache CXF framework (http://cxf.apache.org) to wrap up database queries and business logics into middle-layer services. Finally, a Tomcat server (http://tomcat.apache.org) is used to provide services in the front-end. For a standalone database-mode Genevar, only a MySQL database is required to be installed on user's local machine. Association results are visualized in genomic views by JFreeChart library (http://www.jfree.org/jfreechart/). A gene-centered scatter plot represents observed SNP–gene associations around genes of interest, and a SNP-centered line chart illustrates observed eQTLs surrounding SNPs of interest (Fig. 1). Tested on a 1.6 GHz Pentium Centrino laptop with 1 GB of RAM, Genevar was able to upload a 75 × 23k expression dataset onto the database and built up indexes in 1 min; another 23 min were required for the 75 × 400k genotype file. Once it is uploaded, Genevar can fetch per SNP–probe pairs from these 75 individuals in <0.0257 s from the database, and calculates Spearman's rhos and nominal P-values for 486 SNP–probe pairs in 3 s.
  16 in total

1.  WebQTL: web-based complex trait analysis.

Authors:  Jintao Wang; Robert W Williams; Kenneth F Manly
Journal:  Neuroinformatics       Date:  2003

2.  GWAS GUI: graphical browser for the results of whole-genome association studies with high-dimensional phenotypes.

Authors:  Wei Chen; Liming Liang; Gonçalo R Abecasis
Journal:  Bioinformatics       Date:  2008-11-20       Impact factor: 6.937

3.  Population genomics in a disease targeted primary cell model.

Authors:  Elin Grundberg; Tony Kwan; Bing Ge; Kevin C L Lam; Vonda Koka; Andreas Kindmark; Hans Mallmin; Joana Dias; Dominique J Verlaan; Manon Ouimet; Daniel Sinnett; Fernando Rivadeneira; Karol Estrada; Albert Hofman; Joyce M van Meurs; André Uitterlinden; Patrick Beaulieu; Alexandru Graziani; Eef Harmsen; Osten Ljunggren; Claes Ohlsson; Dan Mellström; Magnus K Karlsson; Olle Nilsson; Tomi Pastinen
Journal:  Genome Res       Date:  2009-08-04       Impact factor: 9.043

4.  Transcriptome genetics using second generation sequencing in a Caucasian population.

Authors:  Stephen B Montgomery; Micha Sammeth; Maria Gutierrez-Arcelus; Radoslaw P Lach; Catherine Ingle; James Nisbett; Roderic Guigo; Emmanouil T Dermitzakis
Journal:  Nature       Date:  2010-03-10       Impact factor: 49.962

5.  Genetics and beyond--the transcriptome of human monocytes and disease susceptibility.

Authors:  Tanja Zeller; Philipp Wild; Silke Szymczak; Maxime Rotival; Arne Schillert; Raphaele Castagne; Seraya Maouche; Marine Germain; Karl Lackner; Heidi Rossmann; Medea Eleftheriadis; Christoph R Sinning; Renate B Schnabel; Edith Lubos; Detlev Mennerich; Werner Rust; Claire Perret; Carole Proust; Viviane Nicaud; Joseph Loscalzo; Norbert Hübner; David Tregouet; Thomas Münzel; Andreas Ziegler; Laurence Tiret; Stefan Blankenberg; François Cambien
Journal:  PLoS One       Date:  2010-05-18       Impact factor: 3.240

6.  Candidate causal regulatory effects by integration of expression QTLs with complex trait genetic associations.

Authors:  Alexandra C Nica; Stephen B Montgomery; Antigone S Dimas; Barbara E Stranger; Claude Beazley; Inês Barroso; Emmanouil T Dermitzakis
Journal:  PLoS Genet       Date:  2010-04-01       Impact factor: 5.917

7.  eQTL Explorer: integrated mining of combined genetic linkage and expression experiments.

Authors:  Michael Mueller; Anuj Goel; Manjula Thimma; Nicholas J Dickens; Timothy J Aitman; Jonathan Mangion
Journal:  Bioinformatics       Date:  2005-12-15       Impact factor: 6.937

8.  Genome-wide associations of gene expression variation in humans.

Authors:  Barbara E Stranger; Matthew S Forrest; Andrew G Clark; Mark J Minichiello; Samuel Deutsch; Robert Lyle; Sarah Hunt; Brenda Kahl; Stylianos E Antonarakis; Simon Tavaré; Panagiotis Deloukas; Emmanouil T Dermitzakis
Journal:  PLoS Genet       Date:  2005-12-16       Impact factor: 5.917

9.  eQTL Viewer: visualizing how sequence variation affects genome-wide transcription.

Authors:  Wei Zou; David L Aylor; Zhao-Bang Zeng
Journal:  BMC Bioinformatics       Date:  2007-01-09       Impact factor: 3.169

10.  Tissue-specific genetic control of splicing: implications for the study of complex traits.

Authors:  Erin L Heinzen; Dongliang Ge; Kenneth D Cronin; Jessica M Maia; Kevin V Shianna; Willow N Gabriel; Kathleen A Welsh-Bohmer; Christine M Hulette; Thomas N Denny; David B Goldstein
Journal:  PLoS Biol       Date:  2008-12-23       Impact factor: 8.029

View more
  186 in total

1.  Identification of two new loci at IL23R and RAB32 that influence susceptibility to leprosy.

Authors:  Furen Zhang; Hong Liu; Shumin Chen; Huiqi Low; Liangdan Sun; Yong Cui; Tongsheng Chu; Yi Li; Xi'an Fu; Yongxiang Yu; Gongqi Yu; Benqing Shi; Hongqing Tian; Dianchang Liu; Xiulu Yu; Jinghui Li; Nan Lu; Fangfang Bao; Chunying Yuan; Jian Liu; Huaxu Liu; Lin Zhang; Yonghu Sun; Mingfei Chen; Qing Yang; Haitao Yang; Rongde Yang; Lianhua Zhang; Qiang Wang; Hong Liu; Fuguang Zuo; Haizhen Zhang; Chiea Chuen Khor; Martin L Hibberd; Sen Yang; Jianjun Liu; Xuejun Zhang
Journal:  Nat Genet       Date:  2011-10-23       Impact factor: 38.330

2.  Identification of critical variants within SLC44A4, an ulcerative colitis susceptibility gene identified in a GWAS in north Indians.

Authors:  A Gupta; B K Thelma
Journal:  Genes Immun       Date:  2016-01-07       Impact factor: 2.676

3.  The relationship between five non-synonymous polymorphisms within three XRCC genes and gastric cancer risk in a Han Chinese population.

Authors:  Huansong Gong; He Li; Jing Zou; Jia Mi; Fang Liu; Dan Wang; Dong Yan; Bin Wang; Shuping Zhang; Geng Tian
Journal:  Tumour Biol       Date:  2015-11-21

4.  The chromatin-binding protein HMGN1 regulates the expression of methyl CpG-binding protein 2 (MECP2) and affects the behavior of mice.

Authors:  Liron Abuhatzira; Alon Shamir; Dustin E Schones; Alejandro A Schäffer; Michael Bustin
Journal:  J Biol Chem       Date:  2011-10-17       Impact factor: 5.157

5.  Identification of a New Susceptibility Locus for Systemic Lupus Erythematosus on Chromosome 12 in Individuals of European Ancestry.

Authors:  F Yesim Demirci; Xingbin Wang; Jennifer A Kelly; David L Morris; M Michael Barmada; Eleanor Feingold; Amy H Kao; Kathy L Sivils; Sasha Bernatsky; Christian Pineau; Ann E Clarke; Rosalind Ramsey-Goldman; Timothy J Vyse; Patrick M Gaffney; Susan Manzi; M Ilyas Kamboh
Journal:  Arthritis Rheumatol       Date:  2016-01       Impact factor: 10.995

6.  A common functional regulatory variant at a type 2 diabetes locus upregulates ARAP1 expression in the pancreatic beta cell.

Authors:  Jennifer R Kulzer; Michael L Stitzel; Mario A Morken; Jeroen R Huyghe; Christian Fuchsberger; Johanna Kuusisto; Markku Laakso; Michael Boehnke; Francis S Collins; Karen L Mohlke
Journal:  Am J Hum Genet       Date:  2014-01-16       Impact factor: 11.025

7.  Association of genetic variation in IKZF1, ARID5B, and CEBPE and surrogates for early-life infections with the risk of acute lymphoblastic leukemia in Hispanic children.

Authors:  Ling-I Hsu; Anand P Chokkalingam; Farren B S Briggs; Kyle Walsh; Vonda Crouse; Cecilia Fu; Catherine Metayer; Joseph L Wiemels; Lisa F Barcellos; Patricia A Buffler
Journal:  Cancer Causes Control       Date:  2015-03-12       Impact factor: 2.506

8.  Preliminary Transcriptome Analysis in Lymphoblasts from Cluster Headache and Bipolar Disorder Patients Implicates Dysregulation of Circadian and Serotonergic Genes.

Authors:  Marta Costa; Alessio Squassina; Ignazio Stefano Piras; Claudia Pisanu; Donatella Congiu; Paola Niola; Andrea Angius; Caterina Chillotti; Raffaella Ardau; Giovanni Severino; Erminia Stochino; Arianna Deidda; Antonio M Persico; Martin Alda; Maria Del Zompo
Journal:  J Mol Neurosci       Date:  2015-04-28       Impact factor: 3.444

9.  A genome-wide association study in Han Chinese identifies a susceptibility locus for primary Sjögren's syndrome at 7q11.23.

Authors:  Yongzhe Li; Kunlin Zhang; Hua Chen; Fei Sun; Juanjuan Xu; Ziyan Wu; Ping Li; Liuyan Zhang; Yang Du; Haixia Luan; Xi Li; Lijun Wu; Hongbin Li; Huaxiang Wu; Xiangpei Li; Xiaomei Li; Xiao Zhang; Lu Gong; Lie Dai; Lingyun Sun; Xiaoxia Zuo; Jianhua Xu; Huiping Gong; Zhijun Li; Shengquan Tong; Min Wu; Xiaofeng Li; Weiguo Xiao; Guochun Wang; Ping Zhu; Min Shen; Shengyun Liu; Dongbao Zhao; Wei Liu; Yi Wang; Cibo Huang; Quan Jiang; Guijian Liu; Bin Liu; Shaoxian Hu; Wen Zhang; Zhuoli Zhang; Xin You; Mengtao Li; Weixin Hao; Cheng Zhao; Xiaomei Leng; Liqi Bi; Yongfu Wang; Fengxiao Zhang; Qun Shi; Wencheng Qi; Xuewu Zhang; Yuan Jia; Jinmei Su; Qin Li; Yong Hou; Qingjun Wu; Dong Xu; Wenjie Zheng; Miaojia Zhang; Qian Wang; Yunyun Fei; Xuan Zhang; Jing Li; Ying Jiang; Xinping Tian; Lidan Zhao; Li Wang; Bin Zhou; Yang Li; Yan Zhao; Xiaofeng Zeng; Jurg Ott; Jing Wang; Fengchun Zhang
Journal:  Nat Genet       Date:  2013-10-06       Impact factor: 38.330

10.  Aberrant methylation of the MSH3 promoter and distal enhancer in esophageal cancer patients exposed to first-hand tobacco smoke.

Authors:  Matjaz Vogelsang; Juliano D Paccez; Georgia Schäfer; Kevin Dzobo; Luiz F Zerbini; M Iqbal Parker
Journal:  J Cancer Res Clin Oncol       Date:  2014-06-17       Impact factor: 4.553

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.