Literature DB >> 27470167

eSNPO: An eQTL-based SNP Ontology and SNP functional enrichment analysis platform.

Jin Li1,2,3, Limei Wang4,5, Tao Jiang1, Jizhe Wang1, Xue Li1, Xiaoyan Liu2, Chunyu Wang2, Zhixia Teng2, Ruijie Zhang1, Hongchao Lv1, Maozu Guo2.   

Abstract

Genome-wide association studies (GWASs) have mined many common genetic variants associated with human complex traits like diseases. After that, the functional annotation and enrichment analysis of significant SNPs are important tasks. Classic methods are always based on physical positions of SNPs and genes. Expression quantitative trait loci (eQTLs) are genomic loci that contribute to variation in gene expression levels and have been proven efficient to connect SNPs and genes. In this work, we integrated the eQTL data and Gene Ontology (GO), constructed associations between SNPs and GO terms, then performed functional enrichment analysis. Finally, we constructed an eQTL-based SNP Ontology and SNP functional enrichment analysis platform. Taking Parkinson Disease (PD) as an example, the proposed platform and method are efficient. We believe eSNPO will be a useful resource for SNP functional annotation and enrichment analysis after we have got significant disease related SNPs.

Entities:  

Mesh:

Year:  2016        PMID: 27470167      PMCID: PMC4965794          DOI: 10.1038/srep30595

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.379


Genome-wide association study (GWAS) is an examination of many common genetic variants in different individuals to see if any variant is associated with a trait. GWAS studies typically focus on associations between single nucleotide polymorphisms (SNPs) and traits like major complex diseases1. Since two SNPs with significantly altered allele frequency between the Age-related Macular Degeneration (ARMD) and healthy controls was firstly found in 20052, more than 100,000 risk SNPs associated to hundreds of diseases in human have been mined via GWAS3. There are several GWAS databases for human diseases and traits, such as GWAS Catalog3, GWAS Central4 and GWASdb56. After getting the significant SNPs, functional analysis is an important task. Generally, SNPs are considered to be functional through related genes, and the most popular method is SNP functional enrichment analysis. Gene ontology (GO) is a major bioinformatics initiative to unify the representation of gene and gene product attributes78. There are several SNP functional database, such as SNP Function Portal9 and F-SNP database10; and SNP functional enrichment analysis methods, such as I-GSEA4GWAS11, SNP-based pathway enrichment analysis12, SNPsnap13 and SNP2GO14. Similar to gene functional enrichment analysis, these methods can be divided into two categories, significant SNPs based methods and SNP sets based methods. A common ground in these methods is that the SNP functions are explained by the related genes according to physical positions on chromosome. Expression quantitative trait loci (eQTLs) are genomic loci that contribute to variation in expression levels of mRNAs15. The first genome-wide gene expression QTL study was carried out in yeast and published in 200216. Plenty of eQTL studies followed in plants and animals, including humans. Studies have shown that SNPs reproducibly associated with complex disorders are significantly enriched for eQTLs relative to frequency-matched SNPs17. Systematic integrations of eQTLs and GWAS have been used to identify risk genes in Schizophrenia18, Psoriasis19, and Muscle traits20. Therefore, eQTL data is an important and useful source for SNP functional annotation. In this study, taking eQTL as medium between SNPs and their functions, we integrated eQTL and GO information and constructed a human SNP Ontology database and SNP functional enrichment analysis platform. It will be an efficient tool after GWAS analysis for a complex trait.

Material and Methods

eQTL data

The eQTL data were collected from several open databases and literatures. The gene expression patterns are specific among tissue types, and so do the eQTL patterns. Therefore, a classification by tissue types is necessary. We classified them into 12 tissues (Table 1). We combined the data from different studies of same tissue type. For each data, we set a significant threshold of FDR < 0.05. We retained only the SNPs with reference names and genes with gene symbols. In each tissue type, the numbers of samples, SNPs and genes are all after the screening.
Table 1

eQTL data in 12 tissues.

Tissue typeSamplesSNPsGenesReference
Adipose Subcutaneous11118963241gtexportal52
Artery Tibia12428332372gtexportal52
Brain765227407161eQTL Browser53, seeQTL54
Heart8714086186gtexportal52
Lung12431905434gtexportal52
Muscle Skeletal14325383301gtexportal52
Nerve Tibial10223253327gtexportal52
Skin11420506296gtexportal52
Blood54794063416780Blood eQTL browser55, gtexportal52
Liver42723053463eQTL Browser53
Lymphoblastoid12202080399168eQTL Browser53, seeQTL54, Liming Liang56
Thyroid11233939481gtexportal52

Brain data

As Parkinson Disease (PD) is a disorder of the central nervous system, we selected eQTL data in brain for a case study from Gibbs et al.21 and Myers et al.22. In Gibbs et al.’s study, four frozen tissue samples of the cerebellum (CRBLM), frontal cortex (FCTX), caudal pons (PONS) and temporal cortex (TCTX) were obtained from 150 neurologically normal Caucasian subjects resulting in 600 tissue samples. SNP genotyping was performed using Infinium HumanHap 550 beadchips (Illumina) for 561,466 SNPs. Profiling of 22,184 mRNA transcripts was performed using HumanRef-8 Expression BeadChips (Illumina). For each of the four brain regions, a regression analysis was performed using Plink23. After eQTL analysis in each brain regions, we integrated the results. In Myers et al.’s study, whole-genome genotyping for 366,140 SNPs and expression analysis of 14,078 genes were carried out on a series of 193 neurologically normal human brain samples using the Affymetrix GeneChip Human Mapping 500 K Array Set and Illumina HumanRefseq-8 Expression BeadChip platforms. A one-degree-of-freedom allelic test of association analysis was performed using Plink23. We integrated the results from these 2 studies. Finally, we got 51,131 significant correlations between 22,740 SNPs and 7,161 genes with the threshold of FDR < 0.05.

Gene annotation data

The gene annotation data was downloaded from the Gene Ontology (GO) database (www.geneontology.org/page/download-annotations)78.

ESNPO construction

We defined associations between SNPs and GO terms via combining the associations between SNPs and genes from eQTL and the associations between genes and GO terms from GO annotation database. A SNP and GO term with at least one common gene will be connected for an association. It was illustrated in Fig. 1.
Figure 1

ESNPO construction.

SNP functional enrichment analysis

We performed Fisher exact test to estimate the significance of associations between SNPs and GO terms. The Fisher exact test is equal to Hypergeometric test. Suppose there are N SNPs and M disease-related SNPs in eSNPO. For a given GO term, there are n SNPs and m disease-related SNPs. The p value is estimated as follows.

P value adjustment

In an analysis, multiple GO terms are tested for significance and the Type I error would increase. Therefore, a multiple test adjustment is needed after estimating p values. There are 7 p value adjustment methods adopted using p.adjust function in R. The Bonferroni correction (“bonferroni”)24 in which the p values are multiplied by the number of comparisons. Less conservative corrections are also included by Holm (“holm”)25, Hochberg (“hochberg”)26, Hommel (“hommel”)27, Benjamini & Hochberg (“BH” or its alias “fdr”)28, and Benjamini & Yekutieli (“BY”)29, respectively. There is no golden standard to compare these methods, and the most popular method is False Discovery Rate method. The False Discovery Rate (FDR) is one way of conceptualizing the rate of type I errors in null hypothesis testing when conducting multiple comparisons. In this study, we used the “fdr” method.

Database

After all, we construct a SNP Ontology and SNP functional enrichment analysis platform (http://bioinfo.hrbmu.edu.cn/esnpo/ or http://nclab.hit.edu.cn/esnpo/). It mainly includes 2 functions, eQTL-based SNP functional annotation and SNP functional enrichment analysis. After removing redundancy, we got 699,445 associations between 21,123 SNPs and 11,714 GO terms. The detailed statistics for the 12 tissues were illustrated in Table 2. The GO terms are formed by 3 components, Biological Process (BP), Cellular Component (CC) and Molecular Function(MF).
Table 2

Summary statistics of eSNPO.

TissueSNPsGO termsBPCCMF
Adipose Subcutaneous61683041667563
Artery Tibial847247828692100
Brain2112311714797911682567
Heart Left Ventricle30492941715964
Lung831553032790113
Muscle Skeletal86375013189687
Nerve Tibial721053333489110
Skin42403582138758
Blood35381711153751411362503
Liver1976776252578591646
Lymphoblastoid18497112158823811742746
Thyroid8258637385113139

Case study

PD SNPs data

PD is a degenerative disorder of the central nervous system mainly affecting the motor system. We used 2,034 unique PD-related SNPs in Guiyou Liu et al.30. These SNPs came from these following works: 41 SNPs were from the GWAS Catalog3; 70 SNPs were from a large PD GWAS with over 3,400 cases and 29,000 controls conducted by Do et al.31; 783 SNPs were from a meta-analysis of PD GWAS with 4,238 PD cases and 4,239 controls performed by Pankratz et al.32; 1,292 SNPs were from a meta-analysis of PD GWAS using a common set of 7,893,274 variants across 13,708 cases and 95,282 controls conducted by Nalls et al.33. The threshold of p values in these studies were set to be 5.00E−08. After removing redundancy, we selected 2034 unique SNPs with P < 5.00E−08.

PD enrichment analysis

In the eQTL-based SNP enrichment analysis, of the 2,034 SNPs, there are 846 SNPs annotated in 77 terms. After Fisher exact test, there are 67 (87.0%) significant terms under the threshold of fdr < 0.01. In the position-based SNP enrichment analysis, of the 2,034 SNPs, there are 1,318 SNPs annotated in 807 terms. After Fisher exact test, there are 396 (49.1%) significant terms under the threshold of fdr < 0.01. Compared between the significant results from eSNPO and position-based enrichment analysis, there are 43 terms in common, including 19 Biological Process (BP) terms, 14 Cellular Component(CC) terms and 10 Molecular Function (MF) terms. From the results, though there are fewer annotated GO terms in eSNPO than position-based method, there are higher proportion of significant results in eQTL-based method. To evaluate the method, we performed literature verification on these significant BP GO terms. Of these 19 BP terms in common between these 2 methods, there are 5 terms about axon or neurons; 5 terms about microtubule; 4 terms about apoptotic, cell death or autophagy; 1 term about pregnancy. The axon or neurons3435, microtubule363738, apoptotic394041, cell death4243 or autophagy3944. pregnancy4546 were verified by other studies. Furthermore, we further verified these significant GO terms only obtained in eQTL-based method (8 BP terms, 8 CC terms and 8 MF terms). Of these 8 BP terms, there are 2 terms about apoptotic signaling pathway47, 1 term about cell proliferation4849, 1 term about cell adhesion50, 2 term about JUN phosphorylation51 which have been verified by other studies.

Conclusion

In this work, we constructed an eQTL-based SNP Ontology and SNP functional enrichment analysis platform (http://bioinfo.hrbmu.edu.cn/esnpo/ or http://nclab.hit.edu.cn/esnpo/). We integrated the eQTL data and GO, constructed associations between SNPs and GO terms, then performed functional enrichment analysis. Taking PD as an example, this eQTL-based method is an efficient method as the position-based method. Therefore, we believe it is a useful SNP functional enrichment analysis resource after we selected significant disease related SNPs. However, there are still some shortages in this method. The first is there may not be enough suitable eQTL data we can use. And the second is that the scale of eSNPO is far less than the position-based method. These shortages will be solved along with more and more eQTL studies have been done.

Additional Information

How to cite this article: Li, J. et al. eSNPO: An eQTL-based SNP Ontology and SNP functional enrichment analysis platform. Sci. Rep. 6, 30595; doi: 10.1038/srep30595 (2016).
  50 in total

Review 1.  Drugs to prevent cell death in Parkinson's disease. Neuroprotection against oxidative stress and inflammatory gene expression.

Authors:  M B Youdim; E Grünblatt; Y Levites-Royak; S Mandel
Journal:  Adv Neurol       Date:  2001

2.  LRRK2 function on actin and microtubule dynamics in Parkinson disease.

Authors:  Loukia Parisiadou; Huaibin Cai
Journal:  Commun Integr Biol       Date:  2010-09

3.  Axon pathology in Parkinson's disease and Lewy body dementia hippocampus contains alpha-, beta-, and gamma-synuclein.

Authors:  J E Galvin; K Uryu; V M Lee; J Q Trojanowski
Journal:  Proc Natl Acad Sci U S A       Date:  1999-11-09       Impact factor: 11.205

4.  Role of the Akt/GSK-3β/CRMP-2 pathway in axon degeneration of dopaminergic neurons resulting from MPP+ toxicity.

Authors:  Wei Fang; Guodong Gao; Haikang Zhao; Yi Xia; Xiaodong Guo; Nan Li; Yuqian Li; Yang Yang; Lei Chen; Qiang Wang; Lihong Li
Journal:  Brain Res       Date:  2014-08-21       Impact factor: 3.252

Review 5.  Genome-wide association studies for common diseases and complex traits.

Authors:  Joel N Hirschhorn; Mark J Daly
Journal:  Nat Rev Genet       Date:  2005-02       Impact factor: 53.242

6.  Complement factor H variant increases the risk of age-related macular degeneration.

Authors:  Jonathan L Haines; Michael A Hauser; Silke Schmidt; William K Scott; Lana M Olson; Paul Gallins; Kylee L Spencer; Shu Ying Kwan; Maher Noureddine; John R Gilbert; Nathalie Schnetz-Boutaud; Anita Agarwal; Eric A Postel; Margaret A Pericak-Vance
Journal:  Science       Date:  2005-03-10       Impact factor: 47.728

7.  Abundant quantitative trait loci exist for DNA methylation and gene expression in human brain.

Authors:  J Raphael Gibbs; Marcel P van der Brug; Dena G Hernandez; Bryan J Traynor; Michael A Nalls; Shiao-Lin Lai; Sampath Arepalli; Allissa Dillman; Ian P Rafferty; Juan Troncoso; Robert Johnson; H Ronald Zielke; Luigi Ferrucci; Dan L Longo; Mark R Cookson; Andrew B Singleton
Journal:  PLoS Genet       Date:  2010-05-13       Impact factor: 5.917

8.  Meta-analysis of Parkinson's disease: identification of a novel locus, RIT2.

Authors:  Nathan Pankratz; Gary W Beecham; Anita L DeStefano; Ted M Dawson; Kimberly F Doheny; Stewart A Factor; Taye H Hamza; Albert Y Hung; Bradley T Hyman; Adrian J Ivinson; Dmitri Krainc; Jeanne C Latourelle; Lorraine N Clark; Karen Marder; Eden R Martin; Richard Mayeux; Owen A Ross; Clemens R Scherzer; David K Simon; Caroline Tanner; Jeffery M Vance; Zbigniew K Wszolek; Cyrus P Zabetian; Richard H Myers; Haydeh Payami; William K Scott; Tatiana Foroud
Journal:  Ann Neurol       Date:  2012-03       Impact factor: 10.422

9.  GWASdb: a database for human genetic variants identified by genome-wide association studies.

Authors:  Mulin Jun Li; Panwen Wang; Xiaorong Liu; Ee Lyn Lim; Zhangyong Wang; Meredith Yeager; Maria P Wong; Pak Chung Sham; Stephen J Chanock; Junwen Wang
Journal:  Nucleic Acids Res       Date:  2011-12-01       Impact factor: 16.971

10.  Large-scale meta-analysis of genome-wide association data identifies six new risk loci for Parkinson's disease.

Authors:  Mike A Nalls; Nathan Pankratz; Christina M Lill; Chuong B Do; Dena G Hernandez; Mohamad Saad; Anita L DeStefano; Eleanna Kara; Jose Bras; Manu Sharma; Claudia Schulte; Margaux F Keller; Sampath Arepalli; Christopher Letson; Connor Edsall; Hreinn Stefansson; Xinmin Liu; Hannah Pliner; Joseph H Lee; Rong Cheng; M Arfan Ikram; John P A Ioannidis; Georgios M Hadjigeorgiou; Joshua C Bis; Maria Martinez; Joel S Perlmutter; Alison Goate; Karen Marder; Brian Fiske; Margaret Sutherland; Georgia Xiromerisiou; Richard H Myers; Lorraine N Clark; Kari Stefansson; John A Hardy; Peter Heutink; Honglei Chen; Nicholas W Wood; Henry Houlden; Haydeh Payami; Alexis Brice; William K Scott; Thomas Gasser; Lars Bertram; Nicholas Eriksson; Tatiana Foroud; Andrew B Singleton
Journal:  Nat Genet       Date:  2014-07-27       Impact factor: 38.330

View more
  5 in total

1.  MutEx: a multifaceted gateway for exploring integrative pan-cancer genomic data.

Authors:  Jie Ping; Olufunmilola Oyebamiji; Hui Yu; Scott Ness; Jeremy Chien; Fei Ye; Huining Kang; David Samuels; Sergey Ivanov; Danqian Chen; Ying-Yong Zhao; Yan Guo
Journal:  Brief Bioinform       Date:  2020-07-15       Impact factor: 11.622

2.  Regulatory annotation of genomic intervals based on tissue-specific expression QTLs.

Authors:  Tianlei Xu; Peng Jin; Zhaohui S Qin
Journal:  Bioinformatics       Date:  2020-02-01       Impact factor: 6.937

Review 3.  Recent innovations and in-depth aspects of post-genome wide association study (Post-GWAS) to understand the genetic basis of complex phenotypes.

Authors:  Zahra Mortezaei; Mahmood Tavallaei
Journal:  Heredity (Edinb)       Date:  2021-10-23       Impact factor: 3.821

4.  Interpreting Functional Impact of Genetic Variations by Network QTL for Genotype-Phenotype Association Study.

Authors:  Kai Yuan; Tao Zeng; Luonan Chen
Journal:  Front Cell Dev Biol       Date:  2022-01-26

5.  Local adaptation in European populations affected the genetics of psychiatric disorders and behavioral traits.

Authors:  Renato Polimanti; Manfred H Kayser; Joel Gelernter
Journal:  Genome Med       Date:  2018-03-26       Impact factor: 11.117

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.