Literature DB >> 30263051

Gene-methylation epistatic analyses via the W-test identifies enriched signals of neuronal genes in patients undergoing lipid-control treatment.

Rui Sun1,2, Haoyi Weng1,2, Ruoting Men1,2, Xiaoxuan Xia1,2, Ka Chun Chong1,2, William K K Wu3, Benny Chung-Ying Zee1,2, Maggie Haitian Wang1,2.   

Abstract

An increasing number of studies are focused on the epigenetic regulation of DNA to affect gene expression without modifications to the DNA sequence. Methylation plays an important role in shaping disease traits; however, previous studies were mainly experiment, based, resulting in few reports that measured gene-methylation interaction effects via statistical means. In this study, we applied the data set adaptive W-test to measure gene-methylation interactions. Performance was evaluated by the ability to detect a given set of causal markers in the data set obtained from the GAW20. Results from simulation data analyses showed that the W-test was able to detect most markers. The method was also applied to chromosome 11 of the experimental data set and identified clusters of genes with neuronal and retinal functions, including MPPED2I, GUCY2E, NAV2, and ZBTB16. Genes from the TRIM family were also identified; these genes are potentially related to the regulation of triglyceride levels. Our results suggest that the W-test could be an efficient and effective method to detect gene-methylation interactions. Furthermore, the identified genes suggest an interesting relationship between lipid levels and the etiology of neurological disorders.

Entities:  

Year:  2018        PMID: 30263051      PMCID: PMC6156903          DOI: 10.1186/s12919-018-0143-8

Source DB:  PubMed          Journal:  BMC Proc        ISSN: 1753-6561


Background

Genetic variants, such as single-nucleotide polymorphisms (SNPs), have been found to influence risk for human diseases. Recent studies show that epigenetics affect SNPs in genes and subsequently influence the gene function and disease trait [1]. Epigenetic mechanisms consist of DNA methylation, histone modifications, and noncoding RNAs, all of which represent the patterns of chemical and structural modifications to DNA [2]. There are an increasing number of laboratory experiments that provide evidence of DNA methylation and gene expression regulation [3-5]. Only a few studies, however, have evaluated the genome–epigenome interactions through statistical means, which may potentially provide novel findings for the joint effects of SNPs and cytosine-phosphate-guanine (CpG) sites [6-9]. The search for SNP-CpG epistasis is usually conducted through multistage or integrated analyses, where the genome and methylation data are first analyzed separately and the results then combined [10, 11]. Some studies apply existing interaction-effect methods, such as regressions, to perform the joint analysis of methylation and genome data. The advantages of the W-test method are data set adaptive probability distributions and robustness for complicated genetic architectures, such as moderate data sparsity and population stratifications [12]. By applying the W-test to gene–methylation data directly, epistasis can be measured without a preselection of biomarkers, while also relying less on significant main effects for detecting important CpG–SNP interactions. The GAW20 provided an opportunity to study methylation and genome-wide association study (GWAS) data from participants who have undergone lipid control treatment. The W-test was applied in the detection of gene–methylation interactions, resulting in interesting findings with biological implications.

Methods

GAW20 experimental and simulated data sets

GAW20 provided the study data. The study participants were patients with diabetes who had undergone lipid-control treatments with the drug fenofibrate and were recruited from the Genetics of Lipid Lowering Drugs and Diet Network (GOLDN) clinical trial project. The analyzed data sets consisted of a simulated and experimental data sets. The triglyceride (TG) levels were collected at 4 clinical visits, with 2 measurements before treatment and 2 measurements after treatment. Age, sex, smoking status, and location were recorded. Genome-wide association data were sequenced with the Affymetrix Genome-wide Human SNP array 6.0, and DNA methylation profiling was performed with the Illumina Infinium Human Methylation 450 K Bead Chip Array, using the buffy coat harvest from blood samples collected at the second and fourth visits. In the simulated data, the phenotype of the simulated data set was generated using experimental genetic data under a hypothetical model [13]. The TG levels were generated from 5 SNPs with major effects and 5 CpG sites in physical proximity. A set of 5 SNP-CpG pairs with relatively high heritability but not related to TG levels was given as noise for testing the statistical methods. The simulated data contained 680 subjects after excluding individuals with missing phenotypes. For simulated data, the 84th replicate was used as suggested by GAW20. In the experimental data set, a total of 523 participants had complete genomic and clinical measurements. Participants with missing values were removed during the quality-control process, resulting in a remaining sample size of 476. The method was applied to chromosome 11 of the experimental data.

Defining drug response

The TG levels can be used as a measure of drug response. Because common clinical standards regard a 30% decrease in TG levels as an effective control of lipids [14], we adopted the same criteria in this study. First, the average pretreatment TG levels (TG_pre) were calculated by averaging the measurements from the first and second visits. The average posttreatment TG levels (TG_post) were calculated by averaging the measurements from the third and fourth visits. Next, a percentage of change was calculated as: ΔTG% = (TG_pre–TG_post)/TG_pre. If the percentage of change was greater than 30%, then the drug treatment was defined as effective; if less than 30%, treatment was defined as ineffective. The effectiveness of the drug response was the outcome variable for both the simulated and experimental data.

The epistasis measure: The W-test

The W-test measures the probability distributional differences for a set of biomarkers between the 2 groups of participants such as the 2 drug-response groups [12]. Under an additive genetic model, a SNP variable can be coded into 3 levels with the counts of the minor alleles. The quantitative CpG variable can be divided into high and low methylation levels by two-mean clustering. A SNP-CpG pair can form a genetic combination of 6 categories. The empirical distributions are compared through a sum of the square of the log odds ratio by: where and are the proportion of cases and controls in the i category out of total cases or controls, respectively. SE is the standard error of the log of odds ratios. The test statistics follows a chi-squared distribution with f degrees of freedom. Two parameters, h and f, are estimated using large-sample approximation by drawing smaller bootstrap samples under a null hypothesis. Consequently, the testing distribution is robust for complicated genetic architectures, as it adaptively adjusts to the data structure of the working data [12]. For detecting the cis-regulation patterns in the gene-methylation data, the SNPs and CpG sites located within a 10-kb genome distance on chromosome 11 were evaluated exhaustively [1]. Two types of logistic-regression models were applied as accompanying benchmarks to the W-test. The first logistic-regression model, LR-m1, considered the CpG site as a binary variable like the W-test, and the second logistic-regression model, LR-m2, included the CpG sites as a continuous variable using the original methylation values. Both logistic-regression models incorporated the main and interaction effects of SNP and CpG sites. In short, we denote: LR-m1: Y = SNP + CpG + SNP × CpG, where CpG is a binary variable; LR-m2: Y = SNP + CpG + SNP × CpG, where CpG is a continuous variable. The type I error rate is an average false-positive proportion using a permuted phenotype on a pair of gene–methylation markers in 2000 replicates. A total of 140,501 epistatic pairs were tested, and a Bonferroni correction resulted in a significance level of 3.56E-7 at a family-wise error rate of 5%.

Results

Performance of the W-test with simulated data

In the simulated data set, the W-test, LR-m1, and LR-m2 were applied to the given causal and noise pairs. Table 1 displays the p values obtained from alternative methods. Generally, the W-test gave smaller p values than LR-m1 in most answer pairs, and also had comparable p values to LR-m2. The top 3 answer pairs were all identified to be significant by the 3 methods. The W-test also found the fourth answer pair (cg00045910, rs10828412) with a p value = 0.0475, which was slightly smaller than the p values from the LR-m1 (p value = 0.0532) and LR-m2 (p value = 0.0597). The results suggested that the W-test could be sensitive to small signals with lower heritability. In terms of the performance for the noise pairs, all methods yielded noise p values greater than 0.05. The Type I error rate of the W-test was 2.95%, less than the family-wise error rate of 5%. Meanwhile, the Type I error rates of LR-m1 was 5.40% and of LR-m2 was 5.43%. The results showed that the W-test was able to distinguish between signal and noise in the simulated data set.
Table 1

p Values of 5 answers and 5 noises by the W-test and the logistic regression models LR-m1 and LR-m2 in simulated data

NoMarker informationp Value
CpGSNPHeritabilityChrW-testLR-m1LR-m2
Answer1cg00000363rs96610590.12514.93E − 51.88E − 42.37E − 5
2cg10480950rs7360040.07566.61E − 42.17E − 33.72E − 4
3cg18772399rs10121160.187.67E − 42.04E − 48.24E − 4
4cg00045910rs108284120.025104.75E − 25.32E − 25.97E − 2
5cg01242676rs43995650.05173.76E − 16.33E − 14.95E − 1
Noise6cg00703276rs295376335.11E − 11.84E − 11.32E − 1
7cg01971676rs696076376.30E − 16.72E − 14.19E − 1
8cg11736230rs2494731141.61E − 12.06E − 11.10E − 1
9cg00001261rs4786421164.18E − 11.46E − 15.56E − 1
10cg12598270rs323312187.33E − 18.03E − 14.19E − 1
p Values of 5 answers and 5 noises by the W-test and the logistic regression models LR-m1 and LR-m2 in simulated data

Computing time

Computing time was calculated on a laptop computer with a 1.6 GHz chipset and 4 GB of random access memory using 2000 replicates on 1 pair of markers. The W-test was 4 times faster than logistic regression on a general laptop (2.28 s by the W-test, 10.12 s by the LR-m1, and 9.37 s by the LR-m2).

Identification of gene–methylation interaction in experimental data

The W-test was applied to test the gene–methylation interactions for GAW20 experimental data on chromosome 11. No significant interaction pair passed the Bonferroni correction significance level of 3.56E-07 (Table 2). We checked the functions of the top 15 identified epistatic pairs and found interesting biological implications. The top 3 SNP-CpG pairs all resided in the gene MPPED2 (11p14.1; p value = 8E-06), which encoded the protein metallophosphoesterase and was reported to be related to neuronal function [15]. Previous GWAS studies and biomedical experiments reported that MPPED2 was associated with chronic kidney disease, and knockdown of this gene in zebrafish embryos suggested a role for it in renal function [16]. GUCY2E was ranked fourth and has been reported to function in the central nervous system and retinal [17, 18]. NAV2 (ranked 6th; p value = 1.78E-05) is a neuron navigator that induces neurite outgrowth for all-trans retinoic acid, and plays an essential role in the development of the cranial nerve and the regulation of blood pressure in humans [19]. ZBTB16 at 11q23.2 (ranked 15th; p value = 7.04E-05) also has been reported as an inhibitor of neurite outgrowth in the adult central nervous system [20]. Other genes in the top 15 identified pairs include TRIM5, TRIM6-TRIM34, and TRIM3 (smallest p value = 5.22E-05), which were highly correlated with TG levels in mice [21]. The quantile–quantile (Q-Q) plot of the gene–methylation tests showed no inflation in spurious relations for the experimental data (Fig. 1).
Table 2

Top 15 gene–methylation pairs identified by the W-test in experimental dataa

SNPCpGDistance (kb)GeneMAFp Value
1rs12288568cg133424351.27 MPPED2 0.0037.49E − 06
2rs11031153cg133424353.86 MPPED2 0.0037.49E − 06
3rs16921036cg133424351.35 MPPED2 0.0018.68E − 06
4rs11237066cg133402724.52 GUCY2E 0.1201.57E − 05
5rs7119411cg174322673.75 C11orf63 0.4301.65E − 05
6rs11025246cg085500268.63 NAV2 0.3951.78E − 05
7rs4347345cg164545872.50 0.0162.78E − 05
8rs16927166cg040549215.60 TNNT3 0.0073.94E − 05
9rs2165313cg110071532.43 B3GAT1 0.2374.06E − 05
10rs11025246cg049168109.60 NAV2 0.3954.86E − 05
11rs3740996cg232173864.60 TRIM5;TRIM6-TRIM34;TRIM3 0.1315.22E − 05
12rs16921012cg133424357.99 MPPED2 0.0015.86E − 05
13rs10895360cg038799715.78 LOC100128088 0.0246.04E − 05
14rs900865cg234540030.87 0.4646.17E − 05
15rs1455650cg257446138.27 ZBTB16 0.1557.04E − 05

aBonferroni corrected significance threshold: 3.56E − 7

Fig. 1

Q-Q plot of gene–methylation interaction using experimental data

Top 15 gene–methylation pairs identified by the W-test in experimental dataa aBonferroni corrected significance threshold: 3.56E − 7 Q-Q plot of gene–methylation interaction using experimental data

Discussion and conclusions

There has been increasing evidence for the contribution of epigenetics in regulating gene expressions implicated in diseases. Previous studies were mainly focused on experimentally studying gene–methylation interactions. In this study, we demonstrated that the W-test can be used as an effective method to identify the epistatic interactions between SNPs and CpG sites in the GAW20 simulated and experimental data sets. One common obstacle in the analysis of epistasis in the genome and epigenome is the large number of pairwise tests, the volume of which is determined by the size of the cis-regulatory region. Existing methods solve the challenge by using stage-wise and integrated analyses, in which the SNPs are separately selected and then the epistatic interactions with CpG sites are jointly evaluated in regression-based approaches [10, 11]. The stage-wise analysis may potentially miss the markers that have weak main effects but strong epistasis effects. Previous studies also made a linear assumption about the relationship between the epistatic pairs and a transformed form of the response variable, while having the advantages of covariate and population structure control. Some nonparametric methods, such as the Mann-Whitney U-test, have been applied for the analysis of methylation data [22]. However, these nonparametric tests cannot handle the potential complicated genetic architectures such as sparse data or population stratification. The W-test has the advantage of being model-free and does not assume any form of interaction effect. It also follows a chi-squared distribution in which the degrees of freedom is estimated from the working data by bootstrapped sampling. In this way, the W-test is able to correct potential bias of the probability distribution caused by complicated data structures. This method is very efficient such that it can be applied directly on SNP-CpG evaluations without prior filtering with the main effect. Application of this method on the experimental data from patients who had undergone treatment for managing TG levels via fenofibrate identified genes that played roles in renal function, the central nervous system, and retinal functions. The enriched signals found in neuronal-related genes suggest that the blood lipid levels could be related to the neurological dysfunction in the brain, which is the most cholesterol-rich organ in the body. By performing an epistatic evaluation between SNPs and CpG sites, we identified MPPED2, GUCY2E, NAV2, and ZBTB16 as associated with hyperlipidemia. Among these 4 genes, MPPED2 was the most significant; it plays a role in neural development, and genetic variations in this gene are reported to be related to migraines, a common disease of the neural system disease [23]. Furthermore, mutations of CUCY2E are reported to be related to retinal disorders [24, 25]. ZBTB16 encodes a protein that is highly expressed in the brain, and polymorphisms in this gene are used as a marker for attention deficit hyperactivity disorder, a neuropsychiatric condition [26]. It is intriguing to note that the enriched signaling in neuronal and retinal genes are identified through epistasis evaluation between SNPs and CpG sites, but not through separate analysis of the main effect in those data sets. This shines light on the importance of integrated analysis of omics data: considering multiple facets or measurement of a common object may improve the chance of catching the underlining signal. Further studies on these threads are necessary to discover the underlying biological mechanism.
  25 in total

Review 1.  Fenofibrate. A review of its pharmacodynamic and pharmacokinetic properties and therapeutic use in dyslipidaemia.

Authors:  J A Balfour; D McTavish; R C Heel
Journal:  Drugs       Date:  1990-08       Impact factor: 9.546

2.  Copy number variation influences gene expression and metabolic traits in mice.

Authors:  Luz D Orozco; Shawn J Cokus; Anatole Ghazalpour; Leslie Ingram-Drake; Susanna Wang; Atila van Nas; Nam Che; Jesus A Araujo; Matteo Pellegrini; Aldons J Lusis
Journal:  Hum Mol Genet       Date:  2009-07-31       Impact factor: 6.150

3.  SNPs located at CpG sites modulate genome-epigenome interaction.

Authors:  Degui Zhi; Stella Aslibekyan; Marguerite R Irvin; Steven A Claas; Ingrid B Borecki; Jose M Ordovas; Devin M Absher; Donna K Arnett
Journal:  Epigenetics       Date:  2013-06-28       Impact factor: 4.528

4.  Meta-analysis of 375,000 individuals identifies 38 susceptibility loci for migraine.

Authors:  Padhraig Gormley; Verneri Anttila; Bendik S Winsvold; Priit Palta; Tonu Esko; Tune H Pers; Kai-How Farh; Ester Cuenca-Leon; Mikko Muona; Nicholas A Furlotte; Tobias Kurth; Andres Ingason; George McMahon; Lannie Ligthart; Gisela M Terwindt; Mikko Kallela; Tobias M Freilinger; Caroline Ran; Scott G Gordon; Anine H Stam; Stacy Steinberg; Guntram Borck; Markku Koiranen; Lydia Quaye; Hieab H H Adams; Terho Lehtimäki; Antti-Pekka Sarin; Juho Wedenoja; David A Hinds; Julie E Buring; Markus Schürks; Paul M Ridker; Maria Gudlaug Hrafnsdottir; Hreinn Stefansson; Susan M Ring; Jouke-Jan Hottenga; Brenda W J H Penninx; Markus Färkkilä; Ville Artto; Mari Kaunisto; Salli Vepsäläinen; Rainer Malik; Andrew C Heath; Pamela A F Madden; Nicholas G Martin; Grant W Montgomery; Mitja I Kurki; Mart Kals; Reedik Mägi; Kalle Pärn; Eija Hämäläinen; Hailiang Huang; Andrea E Byrnes; Lude Franke; Jie Huang; Evie Stergiakouli; Phil H Lee; Cynthia Sandor; Caleb Webber; Zameel Cader; Bertram Muller-Myhsok; Stefan Schreiber; Thomas Meitinger; Johan G Eriksson; Veikko Salomaa; Kauko Heikkilä; Elizabeth Loehrer; Andre G Uitterlinden; Albert Hofman; Cornelia M van Duijn; Lynn Cherkas; Linda M Pedersen; Audun Stubhaug; Christopher S Nielsen; Minna Männikkö; Evelin Mihailov; Lili Milani; Hartmut Göbel; Ann-Louise Esserlind; Anne Francke Christensen; Thomas Folkmann Hansen; Thomas Werge; Jaakko Kaprio; Arpo J Aromaa; Olli Raitakari; M Arfan Ikram; Tim Spector; Marjo-Riitta Järvelin; Andres Metspalu; Christian Kubisch; David P Strachan; Michel D Ferrari; Andrea C Belin; Martin Dichgans; Maija Wessman; Arn M J M van den Maagdenberg; John-Anker Zwart; Dorret I Boomsma; George Davey Smith; Kari Stefansson; Nicholas Eriksson; Mark J Daly; Benjamin M Neale; Jes Olesen; Daniel I Chasman; Dale R Nyholt; Aarno Palotie
Journal:  Nat Genet       Date:  2016-06-20       Impact factor: 38.330

5.  Genome-wide analysis of attention deficit hyperactivity disorder in Norway.

Authors:  Tetyana Zayats; Lavinia Athanasiu; Ida Sonderby; Srdjan Djurovic; Lars T Westlye; Christian K Tamnes; Tormod Fladby; Heidi Aase; Pål Zeiner; Ted Reichborn-Kjennerud; Per M Knappskog; Gun Peggy Knudsen; Ole A Andreassen; Stefan Johansson; Jan Haavik
Journal:  PLoS One       Date:  2015-04-13       Impact factor: 3.240

Review 6.  Impacts of the retinal environment and photoreceptor type on functional regeneration.

Authors:  Michèle G DuVal; W Ted Allison
Journal:  Neural Regen Res       Date:  2017-03       Impact factor: 5.135

7.  The interaction of genetic variants and DNA methylation of the interleukin-4 receptor gene increase the risk of asthma at age 18 years.

Authors:  Nelís Soto-Ramírez; Syed Hasan Arshad; John W Holloway; Hongmei Zhang; Eric Schauberger; Susan Ewart; Veeresh Patil; Wilfried Karmaus
Journal:  Clin Epigenetics       Date:  2013-01-03       Impact factor: 6.551

8.  Gene Therapy Fully Restores Vision to the All-Cone Nrl(-/-) Gucy2e(-/-) Mouse Model of Leber Congenital Amaurosis-1.

Authors:  Sanford L Boye; James J Peterson; Shreyasi Choudhury; Seok Hong Min; Qing Ruan; K Tyler McCullough; Zhonghong Zhang; Elena V Olshevskaya; Igor V Peshenko; William W Hauswirth; Xi-Qin Ding; Alexander M Dizhoor; Shannon E Boye
Journal:  Hum Gene Ther       Date:  2015-08-06       Impact factor: 4.793

9.  Many obesity-associated SNPs strongly associate with DNA methylation changes at proximal promoters and enhancers.

Authors:  Sarah Voisin; Markus Sällman Almén; Galina Y Zheleznyakova; Lina Lundberg; Sanaz Zarei; Sandra Castillo; Fia Ence Eriksson; Emil K Nilsson; Matthias Blüher; Yvonne Böttcher; Peter Kovacs; Janis Klovins; Mathias Rask-Andersen; Helgi B Schiöth
Journal:  Genome Med       Date:  2015-10-08       Impact factor: 11.117

10.  An integrated genetic-epigenetic analysis of schizophrenia: evidence for co-localization of genetic associations and differential DNA methylation.

Authors:  Eilis Hannon; Emma Dempster; Joana Viana; Joe Burrage; Adam R Smith; Ruby Macdonald; David St Clair; Colette Mustard; Gerome Breen; Sebastian Therman; Jaakko Kaprio; Timothea Toulopoulou; Hilleke E Hulshoff Pol; Marc M Bohlken; Rene S Kahn; Igor Nenadic; Christina M Hultman; Robin M Murray; David A Collier; Nick Bass; Hugh Gurling; Andrew McQuillin; Leonard Schalkwyk; Jonathan Mill
Journal:  Genome Biol       Date:  2016-08-30       Impact factor: 13.583

View more
  1 in total

1.  wtest: an integrated R package for genetic epistasis testing.

Authors:  Rui Sun; Xiaoxuan Xia; Ka Chun Chong; Benny Chung-Ying Zee; William Ka Kei Wu; Maggie Haitian Wang
Journal:  BMC Med Genomics       Date:  2019-12-24       Impact factor: 3.063

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.