Literature DB >> 35456489

Additive and Interactive Genetically Contextual Effects of HbA1c on cg19693031 Methylation in Type 2 Diabetes.

Kelsey Dawes1, Willem Philibert1, Benjamin Darbro2, Ronald L Simons3, Robert Philibert1,4,5.   

Abstract

Type 2 diabetes mellitus (T2D) has a complex genetic and environmental architecture that underlies its development and clinical presentation. Despite the identification of well over a hundred genetic variants and CpG sites that associate with T2D, a robust biosignature that could be used to prevent or forestall clinical disease has not been developed. Based on the premise that underlying genetic variation influences DNA methylation (DNAm) independently of or in combination with environmental exposures, we assessed the ability of local and distal gene x methylation (GxMeth) interactive effects to improve cg19693031 models for predicting T2D status in an African American cohort. Using genome-wide genetic data from 506 subjects, we identified a total of 1476 GxMeth terms associated with HbA1c values. The GxMeth SNPs map to biological pathways associated with the development and complications of T2D, with genetically contextual differences in methylation observed only in diabetic subjects for two GxMeth SNPs (rs2390998 AG vs. GG, p = 4.63 × 10-11, Δβ = 13%, effect size = 0.16 [95% CI = 0.05, 0.32]; rs1074390 AA vs. GG, p = 3.93 × 10-4, Δβ = 9%, effect size = 0.38 [95% CI = 0.12, 0.56]. Using a repeated stratified k-fold cross-validation approach, a series of balanced random forest classifiers with random under-sampling were built to evaluate the addition of GxMeth terms to cg19693031 models to discriminate between normoglycemic controls versus T2D subjects. The results were compared to those obtained from models incorporating only the covariates (age, sex and BMI) and the addition of cg19693031. We found a post-pruned classifier incorporating 10 GxMeth SNPs and cg19693031 adjusted for covariates predicted the T2D status, with the AUC, sensitivity, specificity and precision of the positive target class being 0.76, 0.81, 0.70 and 0.63, respectively. Comparatively, the AUC, sensitivity, specificity and precision using the covariates and cg19693031 were only 0.71, 0.74, 0.67 and 0.59, respectively. Collectively, we demonstrate correcting for genetic confounding of cg19693031 improves its ability to detect type 2 diabetes. We conclude that an integrated genetic-epigenetic approach could inform personalized medicine programming for more effective prevention and treatment of T2D.

Entities:  

Keywords:  DNA methylation; epigenetics; type 2 diabetes

Mesh:

Substances:

Year:  2022        PMID: 35456489      PMCID: PMC9025650          DOI: 10.3390/genes13040683

Source DB:  PubMed          Journal:  Genes (Basel)        ISSN: 2073-4425            Impact factor:   4.141


1. Introduction

Type 2 diabetes mellitus (T2D) is the seventh leading cause of morbidity and mortality in the US and is responsible for $327 billion annually in economic damage [1]. T2D is diagnosed in the presence of sustained hyperglycemia that is above the threshold that predisposes to microvascular complications [2]. However, hyperglycemia is the end product of several pathophysiological processes that eventually converge on the inability of the pancreatic β cells to produce enough insulin to meet the demands of target tissues. Our inability to identify and adequately understand this etiological heterogeneity underpins the diagnostic and prognostic challenges faced in clinical practice. Thus, novel precision diagnostic methods that refine the characterization of diabetes to optimize prognostication and therapies are highly desired to improve clinical decisions and patient outcomes. T2D has a complex genetic and environmental architecture that underlies its development and clinical presentation [3]. Genome-wide association studies (GWAS) have identified more than 500 common genetic variants associated with the disease. However, in aggregate, these explain less than 20% of the heritability [4]. Other attempts to elucidate the disease etiology include identifying epigenetic alterations that mediate environmental influences on aberrant gene expression. DNA methylation (DNAm) is the most widely studied and characterized epigenetic modification due to its scalability and its potential for clinical translation. Several investigators have demonstrated that the DNAm status at cg19693031 is highly correlated with both HbA1c values and T2D [5,6,7]. However, these prior studies failed to account for potential confounding genetic influences. Given the dynamic cross-talk between genetic variation, DNA methylation and environmental exposures, these factors should not be considered as self-standing layers of pathophysiological regulation but should only be interpreted in light of each other [8]. For example, several genetic variants for common complex diseases are not detected in GWAS analyses when the environmental influences are not included. Thus, this partially informed approach of traditional GWAS/EWAS analyses fails to adequately explain the interconnected network that underlies the development and progression of diabetes, which limits our ability to prevent and treat the disease. In this communication, we use an omni-genic approach in an attempt to explain how the interactions between cg19693031 methylation and common variants in the cellular context of hyperglycemia can increase disease susceptibility. We then implement functional analyses to investigate possible mechanisms explaining the association between the methylation status of TXNIP’s 3′UTR and prevalent T2D. Finally, we examine if correcting for genetic confounding improves the ability of cg19693031 to detect T2D.

2. Materials and Methods

2.1. Subjects and Recruitment

This study uses genomic and biological data from Wave 5 of primary caregivers (PC) of the Family and Community Health Study (FACHS). The overall design and methods used in the FACHS study have been described previously [9]. Each of these procedures was approved by the University of Iowa Institutional Review Board (IRB 200802719). The data from a total of 506 subjects were retained for further analyses. DNA was prepared using standard cold protein precipitation [10]. Glycosylated hemoglobin A1c (HbA1c) levels were determined by the University of Iowa Diagnostic Laboratories (https://medicine.uiowa.edu/uidl/, accessed on 4 February 2018) using turbidimetric immune inhibition [11] in compliance with standard Clinical Laboratory Improvement Amendments (CLIA) procedures [12].

2.2. Genome-Wide DNA Methylation

Genome-wide methylation was determined by the University of Minnesota Genome Center (http://genomics.umn.edu/, accessed on 10 March 2018) using the Infinium MethylationEpic Beadchip (Illumina, San Diego, CA, USA) according to the manufacturer’s protocol. Standard sample and probe level quality control of the resulting data were conducted as previously described [13].

2.3. Genome-Wide Genotypes

Genome-wide genotype information also was determined by the University of Minnesota Genome Center using the Multi-Ethnic Global-8 Beadchip (Illumina, San Diego, CA, USA) according to the manufacturer’s suggested protocol. Quality control was performed at both the sample and single nucleotide polymorphism (SNP) probe levels in PLINK [14]. SNP probes with a minor allele frequency >5% were retained. Linkage disequilibrium-based SNP pruning was performed with a window size of 50 SNPs, window shift of 5 SNPs and a pairwise SNP-SNP LD threshold of 0.5. A total of 516908 variants for 506 subjects passed filters and QC. For the purposes of these analyses, genotypes were recoded as 0, 1 or 2 in PLINK. Genotypes at rs7211, rs7212 and rs9245 were determined by our standard primer probe genotyping procedures using fluorescent hydrolysable probe assays and reagents from ThermoFisher (Waltham, MA, USA) according to manufacturer’s instructions [15].

2.4. Phenotypes

Phenotypes that were considered in the analyses include age, gender, body mass index (BMI) and type 2 diabetes mellitus (T2D) status. The age used was the age of the subject at the time of the Wave 5 interview. Gender was coded as a binary variable representing self-reported biological sex. BMI was calculated using the measured height and weight of the subject (kg/m2) and used as a continuous variable. T2D status was determined for each subject by binning the HbA1c value at the time of the W5 interview. The binning criteria were adapted from the clinical diagnostic guidelines set by the American Diabetes Association; i.e., those with an HbA1c ≤ 5.7 were binned as normoglycemic controls, HbA1c between 5.7–6.5 were binned as pre-diabetic and HbA1c ≥ 6.5 as diabetic [2]. Normoglycemic control subjects self-reporting a T2D diagnosis and any non-diabetic subject self-reporting the use of anti-diabetic pharmacotherapy were removed from the final dataset.

2.5. Identification of HbA1c-Associated DNAm Loci

The identification of DNAm loci associated with HbA1c values was determined by fitting a linear regression model in R v.3.1.2 using the following equation: A total of 861,916 independent analyses were conducted. The DNAm loci with a genome-wide significance after FDR correction (α = 0.05) were retained for further analyses.

2.6. Identification of Stochastic Epimutations

The distribution and variability of methylation values were evaluated using boxplots for each of the identified HbA1c associated DNAm loci. Stochastic epimutations (SEMs) were defined as methylation outliers exceeding three times the interquartile range (IQR), consistent with the definition of Gentilini and colleagues [16]. SEMs were further classified based on the methylation value compared to the average of the population, as previously described by Wang and associates [17]. High methylation outliers (HMO) were defined using the following equation: Similarly, low methylation outliers (LMO) were defined using the following equation: HbA1c-associated DNAm probes containing SEMs were removed from further analyses.

2.7. Identification of HbA1c-Associated Polymorphic CpGs

DNAm probes potentially affected by polymorphisms at the target site were identified using methods previously described. In brief, the Infinium MethylationEPIC BeadChip manifest file was used to identify SNPs found at the target site for both Infinium Type I and Type II probes. To identify those associated with HbA1c values, a linear regression model was fit in R using the following equation: A total of 302,943 independent analyses were conducted. The DNAm loci with a genome-wide significance after FDR correction (α = 0.05) were retained for further analyses.

2.8. Identification of meQTL Effects of cg19693031 Methylation

To understand the genetically contextual effects on cg19693031 methylation, meQTL analyses in both cis and trans were conducted. Linear regression models were fit in R using the following equation: A total of 117 independent cis analyses of potential interaction effects with cg19693031 were conducted. Further, meQTLs acting in cis were defined as SNPs located on chromosome 1 and within 1Mb of cg19693031. A total of 38750 independent long-range cis analyses were conducted, defined as SNPs located on chromosome 1 and located > 1 Mb of cg19693031. A total of 459464 independent trans analyses were conducted, defined as SNPs not residing on chromosome 1. SNPs with nominally significant effects were retained for further analyses (α = 0.05).

2.9. Genetic + Environmental (G + E) Analyses

The additive genetic and environmental effects on cg19693031 methylation were assessed using the following equation in R: SNPs with nominally significant effects were retained for further analyses (α = 0.05).

2.10. Genetic x Environment (GxE) Analyses

The genetically contextual effects of HbA1c on cg19693031 methylation were assessed using the following equation in R: In particular, the interaction term p-value was used for the selection and ranking of SNPs with nominally significant GxE effects. A total of 117, 38,750 and 459,464 independent cis, long-range cis and trans analyses were conducted, respectively.

2.11. Local TXNIP GxMeth Logistic Regression Classifiers for T2D Status

Training and test datasets were prepared to develop and evaluate each classifier for T2D status in Python v.3.8.10 [18]. A total of 137, 260 and 90 normoglycemic controls, prediabetic and diabetic subjects were retained for logistic modeling. These subjects were used to generate the training (70%) and testing (30%) datasets. T2D status was coded as binary variable, with the more severe dysglycemic status reflecting the positive target class. The target class was stratified between the training and testing datasets to reflect the original class ratio. The performance of each model was evaluated using the receiver operating characteristic (ROC) area under the curve (AUC), sensitivity, specificity and precision of the more severe dysglycemic target class.

2.12. Identification of Distal GxMeth Interactions

The identification of HbA1c-associated distal significant Genetic x Methylation (GxMeth) interactions was determined by fitting a linear regression using the following equation in R: Additive and interactive GxMeth terms were evaluated for each SNP that survived data cleaning and filtering, with a total of 614,729 independent analyses conducted. The SNPs of each GxMeth interactive term with genome-wide significance after Bonferroni correction (α = 0.05) were retained for further analyses. The R package CMplot was used to construct the Manhattan plot [19].

2.13. Distal GxMeth Functional SNP Mapping

Each SNP with a significant GxMeth interaction term was annotated using the Multi-Ethnic Global-8 Annotation File. The 560 mapped genes were used to generate a protein–protein interaction (PPI) network using the STRING database [20]. To investigate the functions of the GxMeth genes, functional enrichment analysis was performed using gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) [21].

2.14. Integrated Genetic–Epigenetic Balanced Random Forest Classifier for T2D Status

A series of balanced random forest classifiers (BRF) were built using scikit-learn in Python v.3.8.10 to evaluate the predictive ability of integrated GxMeth models for discriminating normoglycemic controls versus diabetic subjects. Each BRF classifier was built using a repeated stratified cross-validation (10 splits and 3 repeats) approach. The dimensionality of the dataset was reduced by taking advantage of the implicit feature selection of the random forest using the SelectFromModel function in sklearn. A BRF classifier was built as previously described to evaluate the selected features for predicting T2D status (normoglycemic controls versus diabetic subjects). The selected features were then ranked according to their Gini importance, with the top 13 features retained.

3. Results

3.1. Clinical and Demographic Characteristics

The key demographic and clinical characteristics of the 506 subjects are given in Table 1. The subjects in each group were almost exclusively African American, and predominantly female. Congruent with the clinical observations, the diabetic subjects were significantly older and more obese than the normoglycemic and prediabetic subjects (p = 1.29 × 10−9 and p = 1.12 × 10−6, respectively). While the systolic and diastolic blood pressures were not significantly different between the groups, there was a higher prevalence of self-reported hypertension and anti-hypertensive therapy in the diabetic subjects. A total of 73 subjects self-reported a history of diabetes, with 70% of these subjects having an HbA1c ≥ 6.5% despite only 4% reporting anti-diabetic pharmacological treatment.
Table 1

Clinical and demographic characteristics of subjects.

ControlPre-DiabeticDiabetic
N = 142N = 274N = 90
Age45.3 ± 8.446.7 ± 7.352.1 ± 9.7
Gender
Male36 (25)69 (25)20 (22)
Female106 (75)205 (75)70 (78)
Ethnicity
African American140 (99)273 (99)90 (100)
White2 (1)--
Hispanic-1 (1)-
HbA1c (%)5.34 ± 0.265.96 ± 0.218.02 ± 1.84
BMI31.91 ± 8.4533.73 ± 7.9137.85 ± 9.53
Systolic BP 133.98 ± 21.92134.15 ± 19.59139.67 ± 23.65
Diastolic BP 83.46 ± 13.2483.59 ± 13.2584.06 ± 15.46
Self-Report Dx
Diabetes-22 (8)51 (57)
Hypertension12 (8)43 (16)17 (19)
Cardiac Event1 (1)2 (1)1 (1)
Self-Report Rx *
Diabetes--3 (3)
Blood Pressure12 (8)39 (14)17 (19)
Cholesterol-2 (1)1(1)

Mean ± standard deviation for continuous variables. N (%) for categorical variables. † BP represents blood-pressure (mm Hg). ‡ Dx represents diagnosis. * Represents treatment.

3.2. Identification of HbA1c-Associated DNA Methylation Loci

A total of 514,292 methylation probes survived the data cleaning and quality control measures. To identify HbA1c-associated DNAm loci, each probe was regressed against HbA1c values, controlling for age, sex and BMI. A total of 23 DNAm probes were associated with HbA1c at a false-discovery rate (FDR) of <5% (Table 2). Consistent with prior studies, cg19693031 was the most highly associated probe (R2 = 0.1976, p = 4.43 × 10−12). As illustrated in Figure 1, cg19693031 was significantly demethylated in diabetic subjects compared to both normoglycemic controls (62% vs. 66%; t = 5.089, p = 1.06 × 10−6) and prediabetic subjects (62% vs. 66%; t = −4.4165, p = 2.12 × 10−5).
Table 2

Top 21 ranked HbA1c-associated DNAm loci.

RankCpGGenePositionIsland StatusSEMp-Value
1cg19693031TXNIP3′UTROpen Sea-4.43 × 10−12
2cg02458882OPTN5′UTRIslandHMO3.78 × 10−7
3cg21804949 Open SeaLMO5.28 × 10−7
4cg14955495CASKINTSS1500IslandHMO2.16 × 10−6
5cg18890830GPX6BodyOpen SeaLMO2.86 × 10−6
6cg22544867NELFCDTSS1500N ShoreLMO4.35 × 10−6
7cg03577153ZNF350TS200Open SeaHMO4.95 × 10−6
8cg14277924ATP10DBodyOpen SeaLMO8.16 × 10−6
9cg02184744ANKRD11BodyOpen SeaLMO3.35 × 10−4
10cg14471895FAM120AOSBodyIslandHMO5.30 × 10−4
11cg06655623RASGEF1ATSS1500Open SeaHMO8.09 × 10−4
12cg14691530 S ShelfLMO9.90 × 10−4
13cg13990141FAMQ20A0SBodyIslandHMO8.62 × 10−3
14cg22942097 IslandHMO0.012
15cg01105172 Open SeaLMO0.018
16cg26823705NBPF20BodyOpen SeaHMO0.020
17cg24623376IL2RABodyOpen SeaLMO0.020
18cg10536901SLC25A245′UTRIslandLMO0.020
19cg05922331 Open SeaLMO0.030
20cg03916864UNGTSS1500IslandHMO0.037
21cg12936627PTTG1IPTSS200IslandHMO0.037
22cg25962218LRGUKBodyOpen SeaHMO0.039
23cg05055927 Open SeaLMO0.043

Top 23 significant DNAm loci associated with HbA1c after FDR correction for multiple comparisons.

Figure 1

Distribution of cg19693031 methylation across T2D status.

The distribution and variability of the methylation values were evaluated for each of the identified HbA1c-associated DNAm loci. With the exception of cg19693031 and cg26823705 (Figure 2), the associations to HbA1c were driven by outliers. Consistent with the definition by Gentilini and colleagues, a total of 49 stochastic epimutations (SEMs) were identified in 27 subjects across the 23 DNAm loci. With the exception of cg1963031, SEMs were observed in each probe. The SEMs were significantly enriched in diabetic subjects (OR = 0.135, 95% CI = 0.072–0.248, p-value = 1.40 × 10−11, two-tailed Fischer’s exact test), with 30 of the 49 total number of SEMs found in diabetic subjects. Further, six subjects had more than one SEM across the twenty-three probes, four of whom were diabetic.
Figure 2

Distribution of cg26823705 across T2D status.

The SEMs were further classified into two groups based on the methylation value compared to the population. A total of 26 SEMs were hypermethylated and classified as high methylation outliers (HMO), and a total of 23 SEMs were hypomethylated and classified as low methylation outliers (LMO). Each of the 22 probes contained either HMOs (n = 11) or LMOs (n = 11); however, none contained both. Notably, 14 (54%) and 23 (70%) of the identified HMOs and LMOs were observed in diabetic subjects. To further characterize the HMOs and LMOs, we examined their locations in relation to CpG island regions. The majority of DNAm loci with HMOs resided in CpG islands (n = 7), with the remainder in open seas (n = 3). Notably, the Hidden Markov Model predicts an additional three of these loci are located within an island. Comparatively, the majority of DNAm loci with LMOs resided in the open sea (n = 8), with the remainder in subsequent shores (n = 1), shelves (n = 1) and islands (n = 1). These findings are consistent with those previously reported by Wang et al. [17]. All the DNAm loci with SEMs were excluded from further analyses. A total of 37009 SEMs were detected in 64% (n = 13,528) of the 21269 HbA1c-associated DNAm probes with nominal significance. The total number of SEMs per probe ranged from 0 (n = 7740) to 102 across the 506 subjects. Of the 13,528 probes with SEMs detected, a majority (84%) contained less than five SEMs across the cohort. Additionally, a majority of the SEMs detected were HMOs (n = 25,620). The total number of SEMs per subject ranged from 3 to 7271 across the 21,269 probes, with a median of 20 ± 410 SEMs per subject. No relationship was observed in the number of SEMs with T2D status, age, gender or BMI. With the exception of cg19693031 and cg26823705, the association of each probe to HbA1c diminished after the removal of SEM observations.

3.3. Identification of HbA1c-Associated Polymorphic DNAm Loci

A total of 302,943 polymorphic probes were identified using the EPIC manifest file. Each probe was regressed against HbA1c values adjusted for age, sex and BMI. A total of 54 polymorphic probes were associated with HbA1c at an FDR < 5%. The top 30 ranked HbA1c-associated polymorphic probes are described in Table 3.
Table 3

Top 30 polymorphic CpGs associated with HbA1c values.

RankCpGGeneSNPp-Value
1cg03422185BCKDHArs1451731400.0003
2cg18909131KIAA0427rs773895860.0003
3cg04608829CYP4F12rs739268670.0003
4cg22228131BCAR3rs5277942380.0004
5cg27350255 rs5415330800.0009
6cg04291024TRAPPC9rs1148932010.0009
7cg00539624 rs534060211, rs1160036790.0009
8cg04272044RAD50rs732577570.0013
9cg21808468MCCrs18224870.0014
10cg08491197 rs764976000.0020
11cg07873290 rs1144341590.0020
12cg26502489SCN2Brs78096838, rs5699926260.0020
13cg00888992BRF1rs1161016360.0022
14cg09208162NDUFA10rs1388993260.0024
15cg20494686 rs148229470, rs1411392680.0025
16cg22098375GABBR1rs556388914, rs5642277380.0025
17cg05614028 rs5593291560.0033
18cg19804132CCL8rs1813025240.0045
19cg15166058ACTG2rs1509406640.0047
20cg23511885 rs622193390.0051
21cg14264512AKAP9rs5409812230.0054
22cg18049287ZBP1rs5329360930.0062
23cg16018596 rs789992820.0068
24cg23879504PRLRrs538603339, rs798237710.0077
25cg01704651C1ORF167rs1897378700.0083
26cg25921170ABL2rs5534158410.0097
27cg23576118RGS6rs363220.0137
28cg21442599SMIM8rs181909564, rs1872489400.0137
29cg19532212TEX14rs1858427720.0142
30cg21606577RSPRY1rs1413501180.0142

Top 30 ranked significant polymorphic CpG loci associated with HbA1c after FDR correction for multiple comparisons.

Of the 54 polymorphic DNAm probes, the island status of 11 were identified. The majority resided in shores (n = 7), with the remainder in subsequent islands (n = 3) and shelves (n = 1). The 54 probes mapped to 37 unique genes, with functional enrichment using UniProt for alternative splicing (FDR p-value = 0.0016). These genes also mapped to several relevant KEGG pathways, including type 1 diabetes mellitus, insulin secretion, PI3K-Akt signaling and diabetic cardiomyopathy.

3.4. Genome-Wide cg19693031 meQTL Analyses

Methylation quantitative trait loci (meQTL) analyses were conducted in both cis and trans to identify genetic factors that confound the methylation status of cg19693031. As underlying genomic mechanisms may differ according to proximity, our analyses were separated into: (1) cis meQTL (SNP-CpG distance < 1 Mb, n = 117), (2) long-range cis meQTL (>1 Mb intra-chromosomal, n = 38,750) and (3) trans meQTL (inter-chromosomal, n = 459,464). A total of 7 and 2052 meQTLs acting in cis and long-range cis were detected with nominal significance, with the degree of explained methylation variance ranging from 0.45% to 1.35% and 0.41% to 4.16%, respectively. While the cis long-range meQTLs explained more variance than those acting in cis, the magnitude of explained variance did not correlate with the SNP’s proximity to cg19693031. The meQTLs acting in cis are described in Table 4, while the top 30 long-range cis-meQTLs are described in Table 5. A total of 23,734 meQTLs acting in trans were also detected with nominal significance, with the degree of explained methylation variance ranging from 0.39% to 4.35%. Similar to those acting in cis, there were no observable differences between the chromosomes. The top 30 trans meQTLs are described in Table 6.
Table 4

Cis-meQTLs nominally associated with cg19693031.

RankSNPGeneDistanceAdj R2p-Value
1rs587743199NOTCH2NLA2.26 × 1050.013530.003
2rs16827006ITGA109.83 × 1040.010280.008
3rs16827043 4.61 × 1040.0091990.011
4rs61741868ITGA109.99 × 1040.0058410.030
5rs10910843RNF1152.50 × 1050.0052170.036
6rs112006139PDE4DIP4.87 × 1050.0048550.037
7rs5617144ANKRD351.14 × 1040.0044910.045
Table 5

Top 30 long-range cis-meQTLs nominally associated with cg19693031.

RankSNPGeneDistanceAdj R2p-Value
1rs17023177PTPN146.93 × 1070.04161.52 × 10−6
2rs12718444SLC2A11.02 × 1080.03241.86 × 10−5
3rs2086856SLC2A11.02 × 1080.03092.80 × 10−5
4rs6702764 9.92 × 1070.03082.87 × 10−5
5JHU_1.230641162 8.52 × 1070.033.57 × 10−5
6rs34247575PTGER37.41 × 1070.02688.58 × 10−5
7JHU_1.19366997 1.26 × 1080.02678.80 × 10−5
8rs74121148 1.19 × 1070.02669.06 × 10−5
9rs2792599 1.02 × 1080.02531.30 × 10−4
10rs9326132 1.02 × 1080.02531.31 × 10−4
11JHU_1.78608038 6.68 × 1070.0251.43 × 10−4
12rs57013566 9.97 × 1070.02531.61 × 10−4
13rs12091692 5.29 × 1070.02342.22 × 10−4
14rs72895742TTC229.02 × 1070.02292.52 × 10−4
15rs6699702PRDX19.95 × 1070.02292.52 × 10−4
16rs2422139C1orf1052.69 × 1070.02272.63 × 10−4
17rs10911734 3.40 × 1070.02272.67 × 10−4
18JHU_1.45997600 9.94 × 1070.02252.85 × 10−4
19JHU_1.992041 1.40 × 1080.02273.47 × 10−4
20JHU_1.46359270MAST29.91 × 1070.02094.35 × 10−4
21rs2280511GLIS19.15 × 1070.02084.50 × 10−4
22JHU_1.90440644 5.50 × 1070.02035.04 × 10−4
23JHU_1.49307071AGBL49.61 × 1070.02035.04 × 10−4
24JHU_1.26626016UBXN111.19 × 1080.02035.16 × 10−4
25rs78580993 3.99 × 1070.02025.18 × 10−4
26exm111496FCRL51.21 × 1070.02025.29 × 10−4
27rs10864087 6.89 × 1070.02015.39 × 10−4
28rs12033260LOC1019272447.42 × 1070.02015.43 × 10−4
29rs3935008PTPRVP5.67 × 1070.025.47 × 10−4
30rs11265240 1.41 × 1070.01995.69 × 10−4
Table 6

Top 30 distal meQTLs nominally associated with cg19693031.

RankSNPGeneChromosomeAdj R2p-Value
1rs181228225 150.04251.30 × 10−6
2rs7447064830.04181.44 × 10−6
3rs29620050.03764.46 × 10−6
4rs2390998130.03665.86 × 10−6
5rs744439140.03646.33 × 10−6
6JHU_5.10609303650.03588.38 × 10−6
7rs1054013MEG3140.03461.02 × 10−5
8exm-rs11177669 120.03441.09 × 10−5
9rs7330858 130.03381.30 × 10−5
10rs1451452 20.03371.30 × 10−5
11JHU_18.63134356 180.03361.38 × 10−5
12JHU_16.9476218 160.03281.69 × 10−5
13rs117580705 110.03261.74 × 10−5
14exm-rs11221332ETS1110.0322.09 × 10−5
15rs1380989FGF1230.03152.43 × 10−5
16rs331581 50.03043.24 × 10−5
17rs3794207CAMKK2120.03023.38 × 10−5
18rs568941 160.02973.93 × 10−5
19rs7142470LINC00520140.02963.97 × 10−5
20exm-rs2523608HLA-B60.02964.01 × 10−5
21rs266273 20.02944.24 × 10−5
22JHU_15.87493750AGBL1150.02954.28 × 10−5
23rs73392704 140.02924.50 × 10−5
24exm523653POM121L260.02914.57 × 10−5
25rs6887332CDX150.0294.75 × 10−5
26JHU_14.23027963 140.02894.85 × 10−5
27JHU_3.16412493 30.03124.98 × 10−5
28rs11793972FAM155A90.02885.94 × 10−5
29rs4442648 130.02855.43 × 10−5
30rs73986979FIBCD1170.02825.89 × 10−5
To determine the magnitude of meQTL effects on cg19693031 methylation, a multivariate linear regression model was built using meQTLs that demonstrated an R2 ≥ 0.03 (long-range cis (n = 5) and trans (n = 18)). Using an additive step-wise approach, genetic factors were ranked and fed into the model based on their adj R2. The associations between two meQTLs and cg19693031 methylation were no longer observed when incorporated into the model and were dropped from further analyses. The final meQTL model captured 43.23% of the variance of cg19693031 methylation (p < 2.2 × 10−16). It has been previously suggested that some of the meQTL effects may in fact be occurrences of genetic and environmental interactions (GxE). To better understand the genetic and environmental confounding architecture influencing the methylation of cg19693031, we assessed whether additive or interactive effects explain the variability of cg19693031 methylation better than genetic factors alone. As a means of comparison, HbA1c was first regressed against cg19693031 methylation adjusted for age (R2 = 0.1314, p < 2.2 × 10−16). With the incorporation of HbA1c values to the final meQTL model (G + E), the amount of variance captured increased by 6% (R2 = 0.4968, p < 2.2 × 10−16). Using ANOVA, the additive G + E model significantly outperformed the genetic factors alone (p = 2.62 × 10−14). Next, GxE analyses were conducted and compared to the G + E effects for each of the 21 retained meQTLs. Only one meQTL, rs74439, demonstrated a significant GxE term (p = 0.0146). The GxE effect explained ~ 1% more variance than the G + E model (R2 = 0.1712 vs. R2 = 0.1630) and demonstrated a significantly better fit (ANOVA: p = 0.0156). Incorporating the rs74439 GxE term to the final G + E meQTL model captured an additional 1% of the variance (Table 7: R2 = 0.5077, p < 2.2 × 10−16), and explained the variability of cg19693031 methylation better than genetic or additive G + E factors alone (ANOVA: p = 8.896 × 10−16 and p = 7.49 × 10−4, respectively).
Table 7

Association estimates for features in final multivariate G + E meQTL regression model with one GxE term for cg19693031.

Featureβp-Value
A1C−0.021***
rs1812282250.024***
rs11221332−0.021***
JHU_18.631343560.017***
rs170231770.012***
rs23909980.013***
rs744439:A1C0.008***
rs6702764−0.014***
rs744706480.011**
rs3794207−0.012**
rs331581−0.012**
JHU_1.230641162−0.009**
rs1451452−0.011**
rs73308580.018**
rs1175807050.018**
JHU_5.106093036−0.010**
rs744439−0.044**
rs13809890.009**
rs2962000.009**
rs12718444−0.012**
JHU_16.94762180.012*
rs11177669−0.007*
rs1054013−0.013*

p-value significant codes: <0 (***), <0.001 (**), <0.01 (*). Overall model performance: adj R2 = 0.5077 and p-value < 2.2 × 10−16.

3.5. Functional Mapping of meQTL SNPs

Next, we performed secondary analyses to better characterize the functional effects of genetic variation on cg19693031 methylation. The leading hypothesis to explain cis-meQTL effects is that, if a meQTL is located in a TFBS and hinders the TF from binding, the methylation of surrounding CpGs can be altered directly (recruitment of DNMT or TET enzymes) or indirectly. Of the 25,792 meQTLs, a total of 21 had confirmed associations with T2D via genome-wide association studies. Additionally, a total of 986 are located within exons and 335 are missense mutations. The meQTL SNPs mapped to 4687 genes, with enrichment of two or more SNPs found within 2075 genes. These genes functionally map to several processes, including: alternative splicing (p = 2.09 × 10−25), ABC transporters (p = 1.7 × 10−4), insulin secretion (p = 8.1 × 10−4), lymphocyte mediated immunity (p = 2.61 × 10−4), immunoglobulin production (p = 1.19 × 10−4), complement activation (p = 2.50 × 10−6), immune system process (p = 2.32 × 10−7) and calcium ion transmembrane transporter activity (p = 6.23 × 10−4). To gain insights into the molecular networks involved in nuclear regulation underlying the relationship between genetic variation and cg19693031 methylation, we assessed whether the proximal candidate gene at a trans-acting locus showed covariation with cg19693031 methylation. Using data from the eQTLGen Consortium (n = 31,684 samples), a total of 14,826 cis-eQTL associations with 5955 unique trans-meQTLs were identified with Bonferroni-corrected significance, with 3249 meQTL SNPs influencing the expression of 2+ genes. The identified cis-eQTLs mapped to 6861 unique genes, with enrichment of two or more cis-eQTLs influencing 3542 genes. While the biological mechanisms underlying cis-meQTLs are easily understood and experimentally demonstrated, few clear examples for trans have been uncovered. The simplest hypothesis is that SNPs that act as eQTLs of global methylation regulators have downstream effects as trans-acting meQTLs. A similar potential mechanism is that an SNP residing in the coding region of methyl-specific-binding-proteins could alter their specificity and function and, therefore, modify the DNAm of their binding sites. To that end, several of the identified eQTL genes are known methylomic regulators, including: DNAm writers (DNMT1 and DNMT3a), readers (MBD2, MBD5, UHRF1, UHRF2 and BAZ2B) and erasers (TDF and SMUG1). Further, several of the eQTL genes are known regulators of TXNIP, including: transcription factors (FOXO1, HSF2, RXRA and KLF6), chromatin modifications (HDACs and NF-YA) and factors regulating mRNA turnover (mir-17, ZFP36, NEDD1, NEDD4, NEDD9, WWP1, SMURF2 and HECW2). Provided that distal residues can be brought into physical proximity by 3D structures, an SNP could affect DNAm levels at CpG sites in trans by acting either through cis-meQTL mechanisms or by disrupting the formation of structural loops. To that end, two of the three cohesion subunits were identified as an eQTL gene (SMC3 and RAD21). To gain insights into the potential role of trans meQTL effects on cg19693031 in the pleiotropic functions of TXNIP, functional gene mapping was conducted utilizing KEGG and GO. The eQTL genes mapped to key pathways that are associated with TXNIP’s pathophysiological role in diabetes and coronary heart disease, including: metabolic pathways (n = 504), PI3K-Akt signaling (n = 95), MAPK signaling (n = 81), cytokine-cytokine receptor interaction (n = 72), Rap1 signaling (n = 66), lipid and atherosclerosis (n = 65), ras signaling (n = 59), calcium signaling (n = 56), mTOR (n = 55), cAMP signaling (n = 52), diabetic cardiomyopathy (n = 49), insulin signaling pathway (n = 48), apoptosis (n = 46), Wnt signaling (n = 45), insulin resistance (n = 45), AMPK signaling (n = 45), FoxO signaling (n = 44), natural killer cell mediated cytotoxicity (n = 44), non-alcoholic fatty liver disease (n = 43), TNF signaling (n = 39), leukocyte transendothelial migration (n = 37), NK-kappa B signaling (n = 35), AGE-RAGE signaling pathway in diabetic complications (n = 34), TGF-β signaling (n = 33), b cell receptor signaling (n = 33), glucagon signaling (n = 27), type 1 diabetes (n = 23), PPAR signaling (n = 21), hypertrophic cardiomyopathy (n = 21), regulation of lipolysis in adipocytes (n = 18), glycolysis and gluconeogenesis (n = 18), insulin secretion (n = 18), ABC transporters (n = 17) and type 2 diabetes (n = 12). The eQTLs identified in type I diabetes and AGE-RAGE signaling in diabetic complications are illustrated in Figure 3 and Figure 4, respectively. TXNIP exerts its effects, in part, through the negative regulation of thioredoxin, which both TXN and TXN2 were identified as eQTL genes. Further, the eQTL genes also mapped to pathways associated with TXNIP’s pathophysiological role in cancer and neurodegenerative diseases, including: pathways in cancer (n = 159), pathways in neurodegeneration (n = 135), Alzheimer’s disease (n = 105), Huntington disease (n = 81), Parkinson’s disease (n = 68), microRNAs in cancer (n = 66), proteoglycans in cancer (n = 65), chemical carcinogenesis—ROS (n = 60), transcriptional regulation in cancer (n = 59), hepatocellular carcinoma (n = 52), Hippo signaling pathway (n = 50), cellular senescence (n = 49), breast cancer (n = 34) and p53 pathway (n = 29).
Figure 3

Cis e-QTL genes with trans-meQTL effects on cg19693031 identified in type 1 diabetes pathology. Cis e-QTL genes with meQTL effects on cg19693031 methylation are highlighted in red (n = 23).

Figure 4

Cis e-QTL genes with trans-meQTL effects on cg19693031 identified in AGE-RAGE signaling in diabetic complications. Cis e-QTL genes with meQTL effects on cg19693031 methylation are highlighted in red (n = 34).

3.6. Genome-Wide cg19693031 GxE Analysis

To gain a better understanding of the genetically contextual effect of HbA1c levels on cg19693031 methylation, genome-wide GxE analyses were conducted. A total of seven and 1355 GxE terms acting in cis and long-range cis were detected with nominal significance, with the degree of explained methylation variance ranging from 13.31% to 14.05% and 12.32% to 16.67%, respectively. A total of 16,760 GxE terms acting in trans were detected with nominal significance, with the degree of explained variance ranging from 10.98% to 17.80%. Of the 18,797 GxE terms, a total of 964 were identified in our meQTL analyses (long-range cis: n = 81; trans: n = 883). To determine the magnitude of GxE effects on cg19693031 methylation, an additive step-wise linear regression was fit using GxE terms that demonstrated an adj R2 ≥ 0.1550 and did not contain missing values (n = 44). Notably, a majority of the retained GxE terms were previously identified as meQTLs (n = 32). The features were ranked and fed into the model based on their adj R2, with retainment dependent on each feature remaining significant and a significant increase in the captured variance. A total of six GxE terms were retained for the final GxE model. The final model explained 27.78% of the variance of cg19693031 methylation (Table 8: p < 2.2 × 10−16), and significantly improved the fit compared to HbA1c alone (ANOVA: p < 2.2 × 10−16).
Table 8

Association estimates for features in final multivariate GxE regression model for cg19693031 methylation.

Featureβp-Value
rs22353380.059**
rs72914579−0.091**
A1C:rs2235338−0.008**
JHU_1.520626550.082**
rs112652400.058**
A1C:rs729145790.011*
JHU_2.2348249010.054*
A1C:JHU_1.52062655−0.011*
rs744439:A1C0.007*
A1C:JHU_2.234824901−0.008*
A1C:rs11265240−0.007*
rs744439−0.027
A1C0.007

p-value significiant codes: < 0.001 (**), < 0.01 (*). Overall model performance: adj R2 = 0.2778 and p-value < 2.2 × 10−16.

Next, we performed secondary analyses to better characterize the functional effects of the genetic variation of the GxE loci on cg19693031 methylation. Of the 18,797 SNPs, a total of nine had confirmed associations with T2D via genome-wide association studies. Additionally, a total of 681 are located within exons and 261 are missense mutations. The 18,797 GxE SNPs mapped to 3808 genes, with enrichment of two or more SNPs found within 1533 genes. Interestingly, these genes functionally mapped to alternative splicing (p = 1.183 × 10−36, the same as the HbA1c-associated polymorphic DNAm probes and meQTLs. Similar to the meQTLs, the GxE network mapped to several key TXNIP mechanistic pathways, including the cell adhesion (p = 1.74 × 10−11), glutamatergic synapse (p = 1.89 × 10−5), calcium signaling pathway (p = 1.7 × 10−4), insulin secretion (p = 1.6 × 10−3) and PI3K-Akt signaling pathway (p = 1.6 × 10−3).

3.7. Integrated Model of Genetic and Environmental Effects for cg19693031

Next, we assessed whether an integrated genetic and environmental model accounting for additive and interactive effects explains cg19693031 methylation variability better than either alone. Using a similar step-wise approach, a total of 20 G + E and 6 GxE terms were used to build the integrated model. The final integrated model captured 55.15% of the methylation variance (Table 9: p < 2.2 × 10−16) and performed significantly better than the genetic factors alone (ANOVA: 3.79 × 10−8).
Table 9

Association estimates for features in final multivariate integrated additive and interactive genetic and HbA1c regression model for cg19693031 methylation.

Featureβp-Value
rs1812282250.023***
rs11221332−0.019***
JHU_18.631343560.017***
rs744439:A1C0.009***
rs744706480.010**
rs744439−0.050**
rs11177669−0.009**
rs1451452−0.011**
JHU_1.230641162−0.008**
rs112652400.052**
rs170231770.011**
rs1175807050.018**
rs6702764−0.012**
JHU_1.520626550.072**
rs1054013−0.014**
rs70219110.138**
A1C:rs7021911−0.023**
rs12718444−0.012**
rs3794207−0.010**
rs2962000.009**
rs73308580.015*
rs331581−0.009*
rs13809890.008*
A1C:rs11265240−0.007*
A1C:JHU_1.52062655−0.010*
rs62437200−0.063*
rs23909980.009*
JHU_5.106093036−0.008*
rs7789476−0.007*
A1C:rs624372000.009*
JHU_16.94762180.009*
A1C0.036

p-value significant codes: <0 (***), <0.001 (**), <0.01 (*). Overall model performance: adj R2 = 0.5515 and p-value < 2.2 × 10−22.

3.8. cg19693031 Methylation Classifier for Predicting T2D Status

To determine the ability of cg19693031 to predict T2D status, a series of logistic regression classifiers were built, adjusting for age, sex and BMI. The covariate and cg19693031 models performed equally well in distinguishing diabetic from normoglycemic subjects (Table 10A: AUC = 0.72 and AUC = 0.73, respectively), and prediabetic from normoglycemic subjects (Table 10B: AUC = 0.52 and AUC = 0.53, respectively). However, the model incorporating cg1963031 performed better than the covariate model alone in distinguishing diabetic from prediabetic subjects (Table 10C: AUC = 0.68 and AUC = 0.72, respectively).
Table 10

Performance metrics of logistic regression models for predicting normoglycemic vs. diabetic subjects (A), normoglycemic vs. prediabetic subjects (B) and prediabetic vs. diabetic subjects (C).

Model Predictors Precision Sensitivity Specificity AUC
1Age + Sex + BMI0.6370%74%0.72
2Model 1 + cg196930310.6670%76%0.73
3Model 2 + rs72110.6574%74%0.74
4Model 2 + rs72120.6670%76%0.73
5Model 2 + rs92450.6163%74%0.68
6Model 2 + rs7211 + rs7212 + rs92450.6670%76%0.73
Missing values for predictors result in analysis sample size of 227 subjects for all models.
(A)
Model Predictors Precision Sensitivity Specificity AUC
1Age + Sex + BMI0.6858%46%0.52
2Model 1 + cg196930310.6859%46%0.53
3Model 2 + rs72110.757%54%0.55
4Model 2 + rs72120.6859%46%0.53
5Model 2 + rs92450.6861%44%0.52
6Model 2 + rs7211 + rs7212 + rs92450.6861%44%0.52
Missing values for predictors result in analysis sample size of 397 subjects for all models.
(B)
Model Predictors Precision Sensitivity Specificity AUC
1Age + Sex + BMI0.4367%69%0.68
2Model 1 + cg196930310.5167%78%0.72
3Model 2 + rs72110.5167%78%0.72
4Model 2 + rs72120.563%78%0.71
5Model 2 + rs92450.5367%79%0.73
6Model 2 + rs7211 + rs7212 + rs92450.4659%76%0.67
Missing values for predictors result in analysis sample size of 350 subjects for all models.
(C)
To determine if correcting for confounding local TXNIP genetic variation improves the predictive ability of the cg19693031 models for T2D status, a series of logistic regression classifiers were built, adjusting for age, sex and BMI. Three SNPS located within TXNIP that were previously identified as being associated with diabetes and cardiovascular disease were assessed. Of these, two were located near cg19693031 in the 3′UTR (rs7211 and rs7212), and one SNP was located in the 5′UTR. The incorporation of any local SNP or the addition of all three did not significantly improve the cg19693031 models for T2D status (Table 10).

3.9. Genome-Wide cg19693031 GxMeth Analyses for HbA1c

Based on the understanding that trans genetic variation can confound DNA methylation status, we conducted a genome-wide analysis of SNP interactions with cg19693031 using the Multi-Ethnic Beadchip. The HbA1c values were regressed against additive and interactive GxMeth terms for each SNP, controlling for age, sex and BMI, with a total of 614,729 independent analyses conducted. After correcting for multiple comparisons, a total of 1476 and 1987 SNPs had significant GxMeth and additive effects, respectively (Figure 5). The top-25-ranked GxMeth SNPs are provided in Table 11.
Figure 5

Manhattan plot illustrating the association of GxMeth and SNP terms with HbA1c. Results were adjusted for age, sex and BMI, and corrected for multiple comparisons. Associations at p < 1.33 × 10−7 are above the horizontal black line. The chromosome density, number of significant SNPs residing within a 1 Mb window, is located at the bottom of the plot and the color-coded legend is to the right.

Table 11

Top 25 ranked SNPs with GxMeth effects after Bonferroni correction.

RankSNPChrGeneMAFGxMeth p-ValueSNP p-Value
1rs19321896 0.0278.85 × 10−171.98 × 10−18
2rs107977451DNM30.0341.04 × 10−118.91 × 10−14
3rs611033320MACROD20.0623.44 × 10−118.52 × 10−12
4rs112443769 0.0256.64 × 10−112.57 × 10−12
5rs436963415 0.3167.32 × 10−111.19 × 10−11
6rs2890417SPACA30.0842.56 × 10−106.76 × 10−11
7rs28164908 0.1582.65 × 10−101.17 × 10−11
8rs788455671XPR10.0154.78 × 10−106.70 × 10−11
9rs7799870114 0.0366.45 × 10−105.23 × 10−11
10rs176499520 0.4777.13 × 10−107.68 × 10−11
11rs6206826217 0.0598.54 × 10−101.43 × 10−10
12rs11721919611 0.0271.20 × 10−93.79 × 10−11
13rs122049786 0.0591.24 × 10−91.28 × 10−10
14rs746358541 0.0121.86 × 10−91.53 × 10−10
15rs727114884 0.152.02 × 10−96.88 × 10−10
16rs13841872519MAP3K100.0142.18 × 10−91.98 × 10−11
17rs168408871 0.0233.85 × 10−91.46 × 10−9
18rs622906933 0.2694.84 × 10−91.65 × 10−9
19rs129672016CREBBP0.0326.70 × 10−95.26 × 10−10
20rs177031862 0.0959.96 × 10−91.72 × 10−9
21rs664627043 0.0791.60 × 10−87.75 × 10−10
22rs14251871813 0.0131.61 × 10−83.10 × 10−9
23rs1165194654 0.0132.00 × 10−89.16 × 10−10
24rs168595172NHEJ10.0132.02 × 10−81.59 × 10−9
25rs126766388 0.0152.45 × 10−84.84 × 10−10

Abbreviations: Chr, chromosome; MAF, minor allele frequency. Top 25 SNPs based on the Bonferroni corrected p-value of the Gxmeth term in regression analyses against HbA1c values adjusted for age, sex and BMI.

Next, we performed secondary analyses to better characterize the functional effects of genetic variation in the GxMeth loci on cg19693031 methylation. First, we conducted methylation quantitative trait loci (meQTL) analysis to determine the extent to which genetic variation in the GxMeth loci influences cg19693031 methylation. After Bonferroni correction, rs2390998 was identified as a meQTL (p = 0.045) with a small effect size (0.04, 95% CI = [0.01, 0.08]). As illustrated in Figure 6A, the average methylation across the 506 subjects was significantly lower in rs2390998 heterozygotes (AG) than major allele homozygotes (GG) (pHolm-corrected = 0.005, Δβ = 2%). Notably, the genetic influence of rs2390998 on methylation was only prominent in diabetic subjects, with the most significant difference in methylation seen between the homozygote genotypes (Figure 6B; pHolm-corrected = 4.63 × 10−11, Δβ = 13%; effect size = 0.16, 95% CI = [0.05, 0.32]). Next, we conducted a genetic x environment analysis (GxE) to determine the genetically contextual effect of HbA1c levels on cg19693031 methylation. After Bonferroni correction, rs1074390 had a significant GxE effect (p = 0.048; effect size = 0.165, 95% CI [0.11, 0.22]. While the average cg19693031 methylation did not significantly differ as a function of rs1074390 genotype across the cohort (Δβ = 1%), a significant difference was seen when stratified by T2D status (Figure 7). Similar to rs2390998, differences in methylation across the rs1074390 genotype were observed only in diabetic subjects, with the most significant difference in methylation observed between the homozygote genotypes (pHolm-corrected = 3.92 × 10−4; Δβ = 9%; effect size = 0.38, 95% CI [0.12, 0.56].
Figure 6

cg19693031 distribution as a function of rs2390998 genotype across all 506 subjects (A) and stratified by T2D status (B).

Figure 7

cg19693031 distribution as a function of rs1074390 genotype across all 506 subjects (A) and stratified by T2D status (B).

The significant GxMeth SNPs mapped to 560 unique genes, with enrichment of two or more SNPs found within 103 genes. Of those 560 genes, 167, 144 and 22 genes had confirmed associations to metabolic and cardiovascular disorders and type 2 diabetes, respectively [4,22]. To gain functional insights into these significant GxMeth interactions, a protein–protein interaction network (PPI) was generated using data from the STRING database. A total of 487 genes were correctly matched to the database. Figure 8 illustrates this network, with nodes only shown if they have an edge with a minimum interaction score of 0.9. This network consisted of 477 nodes with 158 edges (PPI enrichment p = 4.1 × 10−9), with an average node degree of 0.662. This network mapped to several biological process GO pathways that are highly associated with the development and complications of T2D, including regulation of voltage-gated sodium channel activity (p = 1.33 × 10−2), protein kinase C-activating G protein-coupled receptor signaling pathway (p = 1.26 × 10−2), cell communication involved in cardiac conduction (p = 1.41 × 10−2), cardiac muscle cell action potential involved in contraction (p = 3.65 × 10−2), regulation of heart rate (p = 6.63 × 10−3), calcium-mediated signaling (p = 2.30 × 10−2) and angiogenesis (p = 1.13 × 10−2). This network also mapped to several KEGG pathways, including glutamatergic synapse (p = 3.8 × 10−3), dilated cardiomyopathy (p = 7.5 × 10−3), Rap1 signaling pathway (p = 8.1 × 10−3), PI3K-Akt signaling pathway (p = 1.56 × 10−2), phospholipase D signaling pathway (p = 2.2 × 10−2), hypertrophic cardiomyopathy (p = 2.27 × 10−2) and Wnt signaling pathway (p = 2.42 × 10−2).
Figure 8

Protein–protein interactome (PPI) generated from regression analyses of significant cg19693031 GxMeth effects against HbA1c.

3.10. Integrated Genetic–Epigenetic Classifier for Predicting T2D Status

To determine if correcting for genetic factors with significant GxMeth effects improves the predictive ability of cg19693031 for T2D status, a series of balanced random forest (BRF) classifiers were built, adjusting for age, sex and BMI (Table 12). Using a repeated stratified k-fold cross-validation approach, each BRF classifier with random under-sampling was built to evaluate the features’ ability to discriminate between control and diabetic subjects. Similar to the logistic regression modeling, two base classifiers were built using the covariate variables solely and another incorporating cg19693031 (AUC = 0.68 and AUC = 0.71, respectively). Next, the 1476 GxMeth genetic factors were incorporated into the model, with a decrease in performance observed in comparison to the base models (AUC = 0.67). This decrease in performance is reflective of the random forest’s known vulnerability to overfitting due to correlated features with little discriminatory value.
Table 12

Balanced random forest models performance metrics for predicting normoglycemic vs. diabetic subjects.

ModelPredictorsPrecisionSensitivitySpecificityAUC
1Age + Sex + BMI0.5674%63%0.68
2Model 1 + cg196930310.5974%67%0.71
3Model 2 + 1476 GxMeth SNPs0.5470%63%0.67
4Model 2 + 551 selected GxMeth SNPs0.6967%81%0.74
5Model 2 + Top 10 GxMeth SNPs0.6381%70%0.76
To resolve this, we performed multivariate ensemble feature selection using random forest with a subsequent BRF classifier to reduce the dimensionality of the dataset. Multivariate approaches consider conditional higher-order interactions between several or all features simultaneously when measuring each feature’s relevance to the overall classification task. A total of 554 features were selected by the random forest learner, including the covariates, cg19693031 and 550 non-local GxMeth SNPs. Using a repeated stratified k-fold cross-validation approach with random under-samplings of the minority class to counteract class imbalance, a BRF model was built to evaluate the selected features for predicting control versus diabetic subjects. The performance of the model improved, with an AUC of 0.74. To further reduce the dimensionality of the dataset, we used the Gini importance to select the top-ranked relevant features. The selected features were ranked according to their Gini Importance, and the top 13 features were retained (Table 13). Using these features, a final BRF classifier was built. The classifier performed better than the previous models, with an AUC of 0.76 (81% sensitivity, 70% specificity). The confusion matrix and ROC curve are illustrated in Figure 9.
Table 13

Top ranked important features selected by a BRF for predicting normoglycemic vs. diabetic subjects.

RankFeature
1Age
2cg19693031
3BMI
4rs78125109
5rs9311874
6rs79244502
7rs10496366
8rs10496731
9rs10032200
10rs77173725
11rs2227818
12rs440617
13rs3218194
Figure 9

Testing of the GxMeth model using the top 13 ranked features for distinguishing normoglycemic controls and diabetic subjects.

4. Discussion

Using data from a group of African American adults, we confirm and extend the prior findings, showing that the methylation of cg19693031 has a significant dose-dependent relationship with HbA1c. We identify confounding genetic variations influencing the methylation status of cg19693031 both independent of and in combination with HbA1c. The pathway analyses identified the regulatory mechanisms related to the 3D genomic structure and TXNIP’s role in disease susceptibility. Finally, we demonstrate that correcting for genetic confounding improves the ability of cg19693031 to detect type 2 diabetes. Collectively, our analyses suggest that the demethylation response of cg19693031 to sustained hyperglycemia in white blood cells may play a pivotal role in the regulatory network effects that underlie several clinical phenotypes, with insights into TXNIP’s pleiotropic functions for disease onset and progression. To the best of our knowledge, our study is the first full integration of genetic and hyperglycemia effects on cg19693031 methylation, laying the groundwork for a more comprehensive understanding of the molecular mechanisms linking the methylation status of TXNIP’s 3′UTR to disease. Collectively, we identify 11,240 genes that participated in significant interactions. The enrichment and functional genomic analyses suggest the crosstalk between genetic variants and the demethylation response of cg19693031 may contribute to the dysregulation of gene expression seen in diabetes. For example, TXNIP has been shown to be induced by glucose and overexpressed in diabetic patients [23]. In addition, the TXNIP locus is complexly regulated by multilayered mechanisms, including transcriptional regulation, microRNA, mRNA stabilization and protein degradation [24]. Our analyses identified several of these regulators, including transcription factors (FOX01, HSF2, RXRA and KLF6) and modulators of mRNA stability (mir-17, ZFP36, NEDD1, NEDD4, NEDD9 and WWP1). In addition, several factors known to influence genome and epigenome plasticity were identified, including methylome regulators, histone modifiers and subunits of the cohesion complex. While our analyses suggest that the interplay between cg1963031 methylation and genetic variation may influence the transcriptional program of leukocytes in the context of hyperglycemia, external validation with expression data is needed. Prior studies have suggested the utility of TXNIP as a biomarker for diabetes due to its hyperglycemia-induced overexpression and the strong association of cg19693031 with HbA1c [7]. While the demethylation of cg19693031 correlates to an increase in TXNIP transcription [5], this upregulation cannot be adequately explained by the demethylation of a 3′UTR CpG site as others have suggested. Whereas methylation within the CpG islands and shores near promoter regions correlates to transcriptional repression [25], the literature to support the same conclusion within gene bodies is lacking. Indeed, the recent work by Albao and colleagues suggests that the demethylation of cg19693031 promotes spurious transcription of the 3′UTR [26]. Prior studies have demonstrated the regulation of TXNIP via miRNA directed degradation of the mRNA, notably miR-17 [27]. Therefore, the spurious transcription of the 3′UTR intragenic region could potentially sequester the miRNA degradation by acting as a competitive endogenous RNA, resulting in an increase in TXNIP mRNA stability. Our findings further support the hypothesis that the demethylation response of cg19693031 may increase the mRNA stability of TXNIP. Further research is needed to validate and extend upon the role of the methylome in the regulation of TXNIP. Our findings will be particularly useful for those seeking to better understand the pathophysiological function of TXNIP in order to optimize diabetes medicine. For example, several cg19693031 interacting genes were enriched in both type 1 (n = 27) and type 2 diabetes (n = 26). In addition, biological pathways related to diabetes were identified, including: insulin section, downstream signaling of insulin and glucagon and sensory perception of chemical stimulus. TXNIP emerged as a viable therapeutic target for diabetes due to its role in pancreatic β-cell apoptosis via activation of the mitochondrial death pathway [28]. The activation of the NLRP3 inflammasome results in the release of proinflammatory cytokines and apoptosis [29]. Critical to this report, TXNIP-induced inflammasome activation and IL-1β production are primarily observed in resident innate immune cells and not the islet β cells themselves [30,31,32]. Pathway analyses identified several mechanisms related to β-cell destruction, such as T-cell signaling and cytotoxic effects and ROS-induced apoptosis. Remarkably, strong interactions were observed with five major islet autoantigens (INS, GAD, IA-2, CPE and ICA). A total of 88 and 66 confounding genetic variants mapped to the NOD-like receptor signaling pathway and apoptosis. In particular, cg19693031 interacted with key loci for inflammasome activation (TRX, regulators of NLRP3 expression and NLRP3 itself), proinflammatory effects (IL-1β, IL-18, IL-6 and TNFα), death receptors (TRAIL-R, Fas and FADD) and other pro-apoptotic genes (BCL-2, BAD, JNK, BAK, ATM and PUMA). Further, TXNIP has been implicated in the development of diabetes-induced complications, including coronary heart disease. In this report, we identified loci mapping to lipids and atherosclerosis (n = 102), diabetic cardiomyopathy (n = 73), insulin resistance (n = 66) and AGE-RAGE diabetic complications (n = 59). This suggests that the pathophysiological role of the demethylation response of cg19693031 may generalize across the diabetes subtypes. As the diagnosis of diabetes mellitus is based on clinical presentation rather than specific molecular defects, overlaps in their etiologies have been documented. Indeed, prior studies have demonstrated the association of cg19693031 methylation and T1D. Further, current clinical trials are investigating the use of Verapamil for treatment of T1D, which has been shown to decrease the expression of TXNIP. Altered TXNIP activity has been implicated in the development of several diseases aside from diabetes, including atherosclerosis, cancer and neurodegenerative diseases [33,34,35,36]. Our functional analyses identified enrichment in several of these diseases, including pathways in cancer (n = 274), pathways in neurodegeneration (n = 205), Alzheimer’s disease (n = 155), microRNAs in cancer (n = 106), transcriptional misregulation in cancer (n = 100) and the Hippo signaling pathway (n = 84). Moreover, prior studies have demonstrated an association between cg19693031 methylation and chylomicrons (Class A), triglycerides, hexose, α-hydrobutyrate, systolic hypertension, waist circumference, gender, metabolic syndrome, type 1 diabetes, coronary heart disease, breast cancer and colon cancer [7,37,38,39,40]. Taken together, this suggests that the demethylation response of cg19693031 in white blood cells is not specific to diabetes, let alone T2D. Therefore, the clinical use of cg19693031 methylation status in its current form as a biomarker for T2D diabetes may lead to inappropriate clinical decisions and poor patient outcomes. Extensive follow-up studies will be required to demonstrate the ability of cg19693031 methylation to accurately stratify diabetes risk in spite of the non-specific demethylation response. We also note that, among those with HbA1c confirmed diabetes, only 57% had come to clinical attention, and less than 1% were on diabetic medications. This observation is reflective of the long asymptomatic phase of the disease, and health disparities seen in African American communities, resulting in a high rate of undiagnosed T2D. Additionally, control and pre-diabetic subjects reported a DM diagnosis, 3% and 9%, respectively, but they did not report the use of anti-diabetic pharmacological therapy. This discordance may be reflective of subjects reporting a pre-diabetic diagnosis, or lowered glucose levels due to lifestyle changes or bariatric surgery. The findings of this study should be considered in the context of its limitations. First, it is important to note that the FACHS cohort is almost exclusively mature adult African American women. Further examinations using a larger number of subjects of other ethnicities, age groups and equal proportions of biological sexes will be required to demonstrate generalizability. Second, although the HbA1c test is the clinical gold standard, it is not sensitive to all pre-diabetes phenotypes [41]. Therefore, replicating our findings with other glycemic indices is needed. Third, the SNPs identified with having significant interactions with cg19693031 DNAm may not be the polymorphism driving the interaction; instead, it may be a tag SNP in full or partial equilibrium with another SNP that is driving the interaction. Additionally, there are known limitations to the Illumina EPIC arrays, such as cross-reactive and polymorphic probes, and β-value skewing due to the normalization process [42,43]. Hence, a more precise method for measuring cg19693031 DNAm, such as digital PCR, should be used to validate and extend our findings. Finally, cg19693931 methylation is confounded by multiple disease processes and cellular contexts. Therefore, while cg19693031 methylation may be sensitive to sustained hyperglycemia, it is not specific to diabetes. In summary, we confirm and extend the prior findings of the demethylation response of cg19693031 to increasing HbA1c values in an African American cohort. This study is the first to extensively examine the genetic and hyperglycemia contextual effects on cg19693031 methylation, laying the groundwork for a better understanding of TXNIP’s pleiotropic pathophysiological effects in the development of disease. Future studies examining the extent of non-diabetes cellular-context-confounding cg19693031 methylation are needed to assess the feasibility for translation and potential clinical utility.
  40 in total

1.  Immunoturbidimetric assay of glycated hemoglobin.

Authors:  P Metus; N Ruzzante; P Bonvicini; M Meneghetti; M Zaninotto; M Plebani
Journal:  J Clin Lab Anal       Date:  1999       Impact factor: 2.352

2.  STRING: a database of predicted functional associations between proteins.

Authors:  Christian von Mering; Martijn Huynen; Daniel Jaeggi; Steffen Schmidt; Peer Bork; Berend Snel
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

3.  The genetic association database.

Authors:  Kevin G Becker; Kathleen C Barnes; Tiffani J Bright; S Alex Wang
Journal:  Nat Genet       Date:  2004-05       Impact factor: 38.330

Review 4.  A review and analysis of the clinical laboratory improvement amendment of 1988: compliance plans and enforcement policy.

Authors:  Patrick A Rivers; Aram Dobalian; Francesco A Germinario
Journal:  Health Care Manage Rev       Date:  2005 Apr-Jun

5.  Diagnosis and classification of diabetes mellitus.

Authors: 
Journal:  Diabetes Care       Date:  2014-01       Impact factor: 19.112

Review 6.  Economic costs of diabetes in the U.S. In 2007.

Authors: 
Journal:  Diabetes Care       Date:  2008-03       Impact factor: 19.112

7.  Resident macrophages mediate islet amyloid polypeptide-induced islet IL-1β production and β-cell dysfunction.

Authors:  Clara Y Westwell-Roper; Jan A Ehses; C Bruce Verchere
Journal:  Diabetes       Date:  2013-11-12       Impact factor: 9.461

Review 8.  Genetic impacts on DNA methylation: research findings and future perspectives.

Authors:  Sergio Villicaña; Jordana T Bell
Journal:  Genome Biol       Date:  2021-04-30       Impact factor: 13.583

9.  Critical evaluation of the Illumina MethylationEPIC BeadChip microarray for whole-genome DNA methylation profiling.

Authors:  Ruth Pidsley; Elena Zotenko; Timothy J Peters; Mitchell G Lawrence; Gail P Risbridger; Peter Molloy; Susan Van Djik; Beverly Muhlhausler; Clare Stirzaker; Susan J Clark
Journal:  Genome Biol       Date:  2016-10-07       Impact factor: 13.583

10.  The genetic architecture of type 2 diabetes.

Authors:  Christian Fuchsberger; Jason Flannick; Tanya M Teslovich; Anubha Mahajan; Vineeta Agarwala; Kyle J Gaulton; Clement Ma; Pierre Fontanillas; Loukas Moutsianas; Davis J McCarthy; Manuel A Rivas; John R B Perry; Xueling Sim; Thomas W Blackwell; Neil R Robertson; N William Rayner; Pablo Cingolani; Adam E Locke; Juan Fernandez Tajes; Heather M Highland; Josee Dupuis; Peter S Chines; Cecilia M Lindgren; Christopher Hartl; Anne U Jackson; Han Chen; Jeroen R Huyghe; Martijn van de Bunt; Richard D Pearson; Ashish Kumar; Martina Müller-Nurasyid; Niels Grarup; Heather M Stringham; Eric R Gamazon; Jaehoon Lee; Yuhui Chen; Robert A Scott; Jennifer E Below; Peng Chen; Jinyan Huang; Min Jin Go; Michael L Stitzel; Dorota Pasko; Stephen C J Parker; Tibor V Varga; Todd Green; Nicola L Beer; Aaron G Day-Williams; Teresa Ferreira; Tasha Fingerlin; Momoko Horikoshi; Cheng Hu; Iksoo Huh; Mohammad Kamran Ikram; Bong-Jo Kim; Yongkang Kim; Young Jin Kim; Min-Seok Kwon; Juyoung Lee; Selyeong Lee; Keng-Han Lin; Taylor J Maxwell; Yoshihiko Nagai; Xu Wang; Ryan P Welch; Joon Yoon; Weihua Zhang; Nir Barzilai; Benjamin F Voight; Bok-Ghee Han; Christopher P Jenkinson; Teemu Kuulasmaa; Johanna Kuusisto; Alisa Manning; Maggie C Y Ng; Nicholette D Palmer; Beverley Balkau; Alena Stančáková; Hanna E Abboud; Heiner Boeing; Vilmantas Giedraitis; Dorairaj Prabhakaran; Omri Gottesman; James Scott; Jason Carey; Phoenix Kwan; George Grant; Joshua D Smith; Benjamin M Neale; Shaun Purcell; Adam S Butterworth; Joanna M M Howson; Heung Man Lee; Yingchang Lu; Soo-Heon Kwak; Wei Zhao; John Danesh; Vincent K L Lam; Kyong Soo Park; Danish Saleheen; Wing Yee So; Claudia H T Tam; Uzma Afzal; David Aguilar; Rector Arya; Tin Aung; Edmund Chan; Carmen Navarro; Ching-Yu Cheng; Domenico Palli; Adolfo Correa; Joanne E Curran; Denis Rybin; Vidya S Farook; Sharon P Fowler; Barry I Freedman; Michael Griswold; Daniel Esten Hale; Pamela J Hicks; Chiea-Chuen Khor; Satish Kumar; Benjamin Lehne; Dorothée Thuillier; Wei Yen Lim; Jianjun Liu; Yvonne T van der Schouw; Marie Loh; Solomon K Musani; Sobha Puppala; William R Scott; Loïc Yengo; Sian-Tsung Tan; Herman A Taylor; Farook Thameem; Gregory Wilson; Tien Yin Wong; Pål Rasmus Njølstad; Jonathan C Levy; Massimo Mangino; Lori L Bonnycastle; Thomas Schwarzmayr; João Fadista; Gabriela L Surdulescu; Christian Herder; Christopher J Groves; Thomas Wieland; Jette Bork-Jensen; Ivan Brandslund; Cramer Christensen; Heikki A Koistinen; Alex S F Doney; Leena Kinnunen; Tõnu Esko; Andrew J Farmer; Liisa Hakaste; Dylan Hodgkiss; Jasmina Kravic; Valeriya Lyssenko; Mette Hollensted; Marit E Jørgensen; Torben Jørgensen; Claes Ladenvall; Johanne Marie Justesen; Annemari Käräjämäki; Jennifer Kriebel; Wolfgang Rathmann; Lars Lannfelt; Torsten Lauritzen; Narisu Narisu; Allan Linneberg; Olle Melander; Lili Milani; Matt Neville; Marju Orho-Melander; Lu Qi; Qibin Qi; Michael Roden; Olov Rolandsson; Amy Swift; Anders H Rosengren; Kathleen Stirrups; Andrew R Wood; Evelin Mihailov; Christine Blancher; Mauricio O Carneiro; Jared Maguire; Ryan Poplin; Khalid Shakir; Timothy Fennell; Mark DePristo; Martin Hrabé de Angelis; Panos Deloukas; Anette P Gjesing; Goo Jun; Peter Nilsson; Jacquelyn Murphy; Robert Onofrio; Barbara Thorand; Torben Hansen; Christa Meisinger; Frank B Hu; Bo Isomaa; Fredrik Karpe; Liming Liang; Annette Peters; Cornelia Huth; Stephen P O'Rahilly; Colin N A Palmer; Oluf Pedersen; Rainer Rauramaa; Jaakko Tuomilehto; Veikko Salomaa; Richard M Watanabe; Ann-Christine Syvänen; Richard N Bergman; Dwaipayan Bharadwaj; Erwin P Bottinger; Yoon Shin Cho; Giriraj R Chandak; Juliana C N Chan; Kee Seng Chia; Mark J Daly; Shah B Ebrahim; Claudia Langenberg; Paul Elliott; Kathleen A Jablonski; Donna M Lehman; Weiping Jia; Ronald C W Ma; Toni I Pollin; Manjinder Sandhu; Nikhil Tandon; Philippe Froguel; Inês Barroso; Yik Ying Teo; Eleftheria Zeggini; Ruth J F Loos; Kerrin S Small; Janina S Ried; Ralph A DeFronzo; Harald Grallert; Benjamin Glaser; Andres Metspalu; Nicholas J Wareham; Mark Walker; Eric Banks; Christian Gieger; Erik Ingelsson; Hae Kyung Im; Thomas Illig; Paul W Franks; Gemma Buck; Joseph Trakalo; David Buck; Inga Prokopenko; Reedik Mägi; Lars Lind; Yossi Farjoun; Katharine R Owen; Anna L Gloyn; Konstantin Strauch; Tiinamaija Tuomi; Jaspal Singh Kooner; Jong-Young Lee; Taesung Park; Peter Donnelly; Andrew D Morris; Andrew T Hattersley; Donald W Bowden; Francis S Collins; Gil Atzmon; John C Chambers; Timothy D Spector; Markku Laakso; Tim M Strom; Graeme I Bell; John Blangero; Ravindranath Duggirala; E Shyong Tai; Gilean McVean; Craig L Hanis; James G Wilson; Mark Seielstad; Timothy M Frayling; James B Meigs; Nancy J Cox; Rob Sladek; Eric S Lander; Stacey Gabriel; Noël P Burtt; Karen L Mohlke; Thomas Meitinger; Leif Groop; Goncalo Abecasis; Jose C Florez; Laura J Scott; Andrew P Morris; Hyun Min Kang; Michael Boehnke; David Altshuler; Mark I McCarthy
Journal:  Nature       Date:  2016-07-11       Impact factor: 69.504

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.