Literature DB >> 35484552

Building a model for predicting metabolic syndrome using artificial intelligence based on an investigation of whole-genome sequencing.

Nai-Wei Hsu1, Kai-Chen Chou2, Chien-Feng Kuo1,3,4, Yu-Ting Tina Wang2, Chung-Lieh Hung1,5, Shin-Yi Tsai6,7,8,9,10.   

Abstract

BACKGROUND: The circadian system is responsible for regulating various physiological activities and behaviors and has been gaining recognition. The circadian rhythm is adjusted in a 24-h cycle and has transcriptional-translational feedback loops. When the circadian rhythm is interrupted, affecting the expression of circadian genes, the phenotypes of diseases could amplify. For example, the importance of maintaining the internal temporal homeostasis conferred by the circadian system is revealed as mutations in genes coding for core components of the clock result in diseases. This study will investigate the association between circadian genes and metabolic syndromes in a Taiwanese population.
METHODS: We performed analysis using whole-genome sequencing, read vcf files and set target circadian genes to determine if there were variants on target genes. In this study, we have investigated genetic contribution of circadian-related diseases using population-based next generation whole genome sequencing. We also used significant SNPs to create a metabolic syndrome prediction model. Logistic regression, random forest, adaboost, and neural network were used to predict metabolic syndrome. In addition, we used random forest model variables importance matrix to select 40 more significant SNPs, which were subsequently incorporated to create new prediction models and to compare with previous models. The data was then utilized for training set and testing set using five-fold cross validation. Each model was evaluated with the following criteria: area under the receiver operating characteristics curve (AUC), precision, F1 score, and average precision (the area under the precision recall curve).
RESULTS: After searching significant variants, we used Chi-Square tests to find some variants. We found 186 significant SNPs, and four predicting models which used 186 SNPs (logistic regression, random forest, adaboost and neural network), AUC were 0.68, 0.8, 0.82, 0.81 respectively. The F1 scores were 0.412, 0.078, 0.295, 0.552, respectively. The other three models which used the 40 SNPs (logistic regression, adaboost and neural network), AUC were 0.82, 0.81, 0.81 respectively. The F1 scores were 0.584, 0.395, 0.574, respectively.
CONCLUSIONS: Circadian gene defect may also contribute to metabolic syndrome. Our study found several related genes and building a simple model to predict metabolic syndrome.
© 2022. The Author(s).

Entities:  

Keywords:  Circadian rhythm; Deep learning; Metabolic syndrome; Whole-genome sequencing

Mesh:

Year:  2022        PMID: 35484552      PMCID: PMC9052619          DOI: 10.1186/s12967-022-03379-7

Source DB:  PubMed          Journal:  J Transl Med        ISSN: 1479-5876            Impact factor:   8.440


Background

Metabolic syndrome (MetS) is a cluster of commonly concurrent metabolic risk factors associated with cardiovascular disease and type 2 diabetes mellitus, including: elevated blood pressure, atherogenic dyslipidemia, insulin resistance, and central obesity (measured as waist circumference with ethnic specific values). Thus, metabolic syndrome can eventually lead to conditions such as Chronic Kidney Disease (CKD) and atherosclerotic cardiovascular disease [1]. Risk factors of metabolic syndrome include family history, smoking, obesity, lack of physical activity and lifestyle factors [2, 3]. Sugar-sweetened soft drinks have been reported to increase risk [4, 5]. Children who have an increased body mass index (BMI), systolic blood pressure (SBP) and triglyceride levels are believed to be at higher risk of developing MetS in middle age [6]. The prevalence of metabolic syndrome is highest among those who are overweight and obese. The International Diabetes Federation (IDF) estimated that one-quarter of the world’s population suffers from metabolic syndrome. Taking age into consideration, metabolic syndrome appears to be most common in the elderly in those who are over 60 of age [2]. On average, the prevalence of metabolic syndrome in adults is about 23% [7]. A national survey done in Taiwan, the Nutrition and Health Survey in Taiwan (NAHSIT) 2005–2008 showed a significant increase in the prevalence of MetS from 13.6% (1993–1996) to 25.5% (2005–2008) for males, and 26.4% to 31.5% in females respectively over a period of 10–15 years. The relationship between diabetes, high blood pressure, heart disease, cerebrovascular disease and metabolic syndrome is inseparable, as these conditions and or their associations are among the top ten causes of death in Taiwan [8]. Circadian rhythm plays an important role in endocrine secretion, body temperature [9]. An important aspect of circadian rhythms is that they persist in the absence of external cues [10]. Circadian genes which express periodically in an approximate 24- hour period help to regulate the genes of metabolism [11-13]. Previous animal models have showed that knockout of specific circadian gene will influence the circadian behavior. The recognition that multiple transcription factors function in the circadian gene, and that each of these has thousands of genomic DNA binding sites. Each of the circadian genes contributes directly to individual gene regulation in addition to its role in the reciprocal and homeostatic regulation of other clock genes by transcriptional-translational feedback loops that define the clock itself [14]. Many disease have been found to related to circadian genes including Alzheimer’s diseases, Parkinson disease [15], atherosclerotic disease [16] or viral infection. Circadian rhythm also affects oxidative stress, too. If the human body or cells experience significant stress, their ability to regulate internal systems, including redox levels and circadian rhythms, may become impaired [17]. Animal studies have showed that risperidone may reset circadian rhythm [18]. Risperidone was found to induce cytotoxicity via rising reactive oxygen species (ROS), mitochondrial potential collapse, lysosomal membrane leakiness, GSH depletion and lipid peroxidation, and some antioxidant like coenzyme Q10 or N-acetyl cysteine may have a role as a therapeutic options [19]. Circadian rhythm also has played a role in liver lipid metabolism and renin angiotensin system [20] and chronic fatigue syndrome [21, 22]. The timing of statins therapy may influence the effect [23]. Renin angiotensin system was found to induce oxidative stress and fibrogenic cytokine [24]. Altering circadian rhythm may have a huge amount of influence over treatment of chronic liver diseases. Increasing evidence shows that circadian clock genes may contribute to the development of metabolic syndrome [25, 26]. Circadian clocks regulate the timing of biological events including the sleep–wake cycle, energy metabolism, and secretion of hormones, etc. In an association and interaction analysis from Lin et al., the study proposed that many of these core circadian clock genes impacts metabolic activity and metabolism, which may lead to metabolic syndrome [27]. We targeted the core circadian clock genes that have been potentially linked with MetS.

Method

Study population

We used Taiwan Biobank (TWB) NGS cohort as our study population. TWB collects lifestyle, genomic data, and represent diseases from Taiwan residents. TWB recruits community-based volunteers who are 30 to 70 years of age and have no history of cancer. This cohort was based on the recruitment and monitoring from the general Taiwanese population, and has been utilized in previous genetic studies [28]. Our study included 642 TWB individuals who have whole genome sequence (WGS) data.

Metabolic syndrome definition

According to the new International Diabetes Federation (IDF) definition, metabolic syndrome must meet the criteria of having central obesity (measured in waist circumference specific to the ethnic values, see below) plus 2 of the following 4 factors: Triglycerides ≥ 150 mg/dL (1.7 mmol/L) or taking drug treatment for elevated triglycerides Fasting glucose ≥ 100 mg//dL or previously diagnosed Type 2 Diabetes Mellitus Reduced high-density lipoprotein (HDL) cholesterol or drug treatment for reduced HDL cholesterol: in men, < 40 mg/dL (1.0 mmol/L) in women, < 50 mg/dL (1.3 mmol/L) Elevated blood pressure demonstrated by any of the following: systolic blood pressure ≥ 130 mm Hg or diastolic blood pressure ≥ 85 mm Hg or antihypertensive drug treatment in a patient with a history of hypertension. As our study took place in Taiwan and our data from the Taiwan Biobank, we used the ethnic specific values for waist circumference according to the “South Asians” and “Chinese” groups, where central obesity was defined as having a waist circumference of ≥ 90 cm in males and ≥ 80 cm in females.

Finding suspected single nucleotide polymorphisms

This analysis analyzed a total of 642 cases of WGS with the illumina platform (of which 123 were defined as metabolic syndrome patients) with target genes: ALAS1, APOA5, ARNTL, BUD13, CETP, CLOCK, CRY1, CRY2, CSNK1D, CSNK1E, GSK3B, LIPA, NPAS2, NR1D1, PER1, PER2, PER3, RORA, RORB, RORC, SMAD2, SMAD3, SMAD4, TGFB2, TGFB3, TGFBR2 and other genes within the range of SNPs for analysis. The range of SNP was set between 17 and 37 (average of > 30) with Qual >  = 30 [29]. However, during this experiment, the range of data analysis was larger than originally expected due to a problem of the single nucleotide polymorphism (SNP) range set for CSNK1E. The definition of metabolic syndrome was primarily based on the physiological data of Taiwan's BioBank database. After it was imported into the SQL server, the patients were grouped with the database language as the basis for subsequent analysis. The frequency of occurrence of single-strand, double-strand variation or non-variation in each group was counted. Subsequently the mathematical formula was written in Python and statistical analysis was applied to calculate the 95% confidence interval and the chi-square or Fisher’s Exact test to calculate the p value. After identifying significant SNPs, we conducted subgroup analysis to find out whether these SNPs are related to hypertension, low HDL level, diabetes or high TG level. Bonferroni Correction was used to tackle Multiple hypothesis testing, due to there are 5 category of metabolic syndrome, alpha value was set to 0.5/5 = 0.1.

Statistical analyses

P values for continuous variables were calculated using student’s t test. Categorical variables were compared using the chi-square test or exact test. Given the exploratory nature of this study, P < 0.05 was considered statistically significant. We use caret package in R software version 4.04 for model prediction. We also use C#, python and MySQL for data manipulation.

Creation of genome-based prediction model

We use significant SNPs to create a metabolic syndrome prediction model. Logistic regression, random forest, adaboost, and neural network were used to predict metabolic syndrome. The data was used for training set and testing set using five-fold cross validation. We assumed that there was a cumulative effect on SNPs, so we take homozygous equal to 2, heterozygous equal to 1 and wild type as 0. Since weight may be influenced by these genes, weights are not use as a covariate [30]. Besides the four models mentioned above, we selected 40 importance SNPs according to random forest important matrix, then using them to create another three model using the logistic regression, adaboost and neural network method (Fig. 1). We used a simple neural network with one layer and size 10 units in the hidden layer and decay equals to 0. Each model was evaluated with the following criteria: area under the receiver operating characteristics curve (AUC), precision, F1 score, and average precision (the area under the precision recall curve).
Fig. 1

Flow diagram for model building

Flow diagram for model building

Results

Baseline characteristic of metabolic syndrome individuals and control group

Among 642 study population, there were 124 individuals with metabolic syndrome and 518 individuals without metabolic syndrome. The mean age of metabolic syndrome cohort was 51 years old, and the mean age of non-metabolic syndrome cohort was 44 years old. We have found that the values of waistline, blood pressure, triglyceride level, hemoglobin A1C, fasting glucose and diabetes mellitus percentage in metabolic syndrome patient is higher than those without metabolic syndrome. In addition, the high-density lipoprotein value in metabolic syndrome is lower than those without metabolic syndrome which is corresponding to metabolic syndrome definition (Table 1).
Table 1

Baseline characteristic of the patients

No metabolic syndrome (N = 518)Metabolic syndrome (N = 124)P-value
AGE(Years)44.48 ± 10.1951.76 ± 10.02< 0.001
HEIGHT(cm)165.44 ± 7.89165.26 ± 8.630.831
WEIGHT(Kg)64.7 ± 11.4475.92 ± 12.89< 0.001
WAISTLINE(cm)81.61 ± 9.1193.03 ± 8.81< 0.001
SBP(mmHg)111.43 ± 13.86130.28 ± 16.89< 0.001
DBP(mmHg)70.76 ± 9.6981.92 ± 12< 0.001
HBA1C(%)5.57 ± 0.516.28 ± 1.21< 0.001
FASTING_GLUCOSE91.56 ± 11.69111.7 ± 31.5< 0.001
Total cholesterol190.68 ± 33.28199.02 ± 40.620.036
TG93.39 ± 54.47211.32 ± 151.67< 0.001
HDL_C55.47 ± 13.842.23 ± 9.95< 0.001
LDL_C120.61 ± 31.01122.8 ± 38.010.553
BUN11.98 ± 3.2913.68 ± 3.87< 0.001
CREATININE0.73 ± 0.190.81 ± 0.280.005
URIC_ACID5.43 ± 1.396.43 ± 1.52< 0.001
SEX(female)231(45%)49(40%)0.402
Diabetes(%)0(0%)15(12%)< 0.001

P values are calculated from t-test for continuous variables or from chi-square test for categorical

Variables. SBP, systolic blood pressure; DBP, diastolic blood pressure; HDL_C, high density lipoprotein; LDL_C, low density lipoprotein; BUN, blood urea nitrogen

Baseline characteristic of the patients P values are calculated from t-test for continuous variables or from chi-square test for categorical Variables. SBP, systolic blood pressure; DBP, diastolic blood pressure; HDL_C, high density lipoprotein; LDL_C, low density lipoprotein; BUN, blood urea nitrogen Table 1 show the metabolic syndrome baseline value.

Spectrum of metabolic syndrome mutant alleles

We searched all alleles in the reference circadian gene and used chi-square test to find whether heterogenous or homogenous genotype is related to metabolic syndrome. Among the genes searched, we found 186 significant SNPs in circadian gene which is associated with metabolic syndrome. (Table 2). In the 186 SNP alleles, we identified 47 alleles associated with hypertension (Table 3), 27 alleles associated with diabetes mellitus (Table 4), 10 alleles associated with low HDL-C (Table 5) and 46 alleles associated with high TG level (Table 6).
Table 2

Significant SNPs and odds ratio

Gene refGenersIdHO_CIHO_pvaluesHE_CIHE_pvalues
GGTLC2;MIR650rs40505061.72–29.820.00060.01–0.550.0003
GGTLC2;MIR650rs29049241.49–15.720.00270.01–0.650.0012
APOL3rs1326531.54–82.850.00120.01–0.650.0012
APOL3rs1326511.54–82.850.00120.01–0.670.0012
APOL3rs48214601.5–80.840.00120.01–0.670.0012
GGTLC2;MIR650rs48222801.36–6.720.00720.01–0.740.003
GGTLC2;MIR650rs4551941.65–28.640.0010.04–0.620.001
HPS4rs567820741.37–9.170.01380.34–0.920.0271
TMEM211rs616435721.07–2.40.02820.37–0.840.0061
TMEM211rs738791660.25–0.670.00051.49–4.030.0005
EMID1rs28574630.07–0.810.02651.24–15.290.0265
POM121L1Prs60031231.18–2.620.00690.35–0.810.0038
GGTLC2rs124846321.24–80.01220.09–0.740.004
POM121L1Prs38760451.12–5.10.03030.21–0.940.0428
MYO18Brs60048650.17–0.750.00791.14–2.520.0114
APOL3rs1326501.29–7.170.01230.11–0.710.0039
PVALBrs342625001.39–10.920.0040.09–0.720.004
APOL3rs350414941.16–3.960.01840.12–0.750.0057
APOL4rs1327181.04–11.270.02880.09–0.960.0288
PRAMENP;VPREB1rs23300361.28–8.290.00830.1–0.780.0089
CSF2RB;LL22NC01-81G9.3rs39500401.14–5.260.03290.38–0.950.0382
MYO18Brs22696351.1–2.440.01980.4–0.920.0254
APOL3;APOL4rs1326651.35–7.520.00840.13–0.740.0084
LL22NC03-63E9.3;POM121L1Prs9644651.24–80.01220.13–0.840.012
POM121L1Prs38760461.02–2.350.04790.34–0.820.0061
RORArs114307621.08–3.460.03240.3–0.960.0442
LL22NC03-63E9.3;POM121L1Prs4575601.24–80.01220.13–0.860.0173
LINC00895;SEPT5rs57468140.19–0.930.04051.15–2.530.0106
LINC00895;SEPT5rs81430550.19–0.930.04051.13–2.490.0134
NULLrs622280821.21–7.850.01190.09–0.720.004
CACNG2rs48215081.13–3.720.02540.35–0.840.0069
GGTLC2;MIR650rs57594681.14–6.380.02960.16–0.880.0296
APOL2rs1327591.26–4.950.01030.18–0.760.0076
CACNG2rs20139241.13–3.720.02540.38–0.890.0153
SCARF2rs7596091.07–2.520.02830.34–0.830.0075
CACNG2rs48215061.07–3.90.04320.4–0.940.0325
CACNG2rs22839811.13–3.720.02540.4–0.910.0217
NULLrs605806981.1–3.10.02540.34–0.970.047
CES5AP1rs57516431.14–6.380.02960.17–0.930.0425
GGTLC2;MIR650rs48205311.07–6.040.04250.17–0.930.0425

HO_CI, homozygous confidence interval; HE_CI, heterozygous confidence interval

P values are calculated from chi square test

Table 3

Hypertension related SNPs

SNPORlowerupperrefGene
rs1327591.8711.0953.423APOL2
rs1326651.8931.0113.879APOL3;APOL4
rs25222910.6960.5140.945CECR2
rs48200011.3661.0231.841CECR3;CECR2
rs57470681.3671.0181.857CECR3;CECR2
rs353056661.461.0642.035DERL3
rs57600611.4541.11.939DERL3
rs57600621.4881.0792.084DERL3
rs4436780.4660.2960.74DGCR8
rs20789731.4731.022.176DUSP18;SLC35E4
rs48222801.5071.0312.347GGTLC2;MIR650
rs48229321.3851.0081.891LOC100507657;MN1
rs667864601.4091.011.95LOC100507657;MN1
rs96121541.3371.031.742MIR650;MIR5571
rs20704551.4751.0712.062MMP11
rs57600121.5021.092.101MMP11
rs72897941.4751.0712.062MMP11
rs7387891.4661.0632.053MMP11
rs7387891.4661.0632.053MMP11
rs605806980.7930.6470.97NULL
rs614080701.4931.0832.088NULL
Unknow064951.8681.2952.699NULL
rs3954460.4590.2980.71RANBP1;TRMT2A
rs3954460.4590.2980.71RANBP1;TRMT2A
rs7596092.1641.0215.329SCARF2
rs64946351.8751.1023.421SMAD3
rs106817861.461.0642.035SMARCB1
rs15732771.4881.0792.084SMARCB1
rs19722571.4931.0832.088SMARCB1
rs19722571.4931.0832.088SMARCB1
rs20704581.4541.11.939SMARCB1
rs20733921.4881.0792.084SMARCB1
rs21863701.4541.11.939SMARCB1
rs22670391.4541.11.939SMARCB1
rs343784491.4931.0832.088SMARCB1
rs57517401.5021.092.101SMARCB1
rs57517411.4921.0852.083SMARCB1
rs57600381.4791.0752.066SMARCB1
rs57600461.5081.0912.117SMARCB1
rs57600461.5081.0912.117SMARCB1
rs57600531.4341.032.028SMARCB1
rs57600571.511.0982.109SMARCB1
rs59966201.4881.0792.084SMARCB1
rs96082011.4541.11.939SMARCB1
rs1748770.4860.30.799TANGO2
rs616435721.6161.062.43TMEM211
rs738791661.6161.062.43TMEM211

OR, odds ratio; lower, lower confidence interval; upper, upper confidence interval

Table 4

Diabetes mellitus related SNPs

SNPORlowerupperrefGeneHO
rs4035171.4411.0492.008BMS1P20;ZNF280BG/G
rs4055701.4221.0451.96BMS1P20;ZNF280BT/T
rs4436780.5990.3750.975DGCR8C/C
rs57491501.961.2523.215DUSP18;SLC35E4G/G
rs124846322.3981.1695.798GGTLC2G/G
rs4551942.8311.2268.232GGTLC2;MIR650G/G
rs96239640.7040.5110.974IGLL5C/C
rs4575603.5111.5410.139LL22NC03-63E9.3;POM121L1PC/C
rs9644653.5561.53910.335LL22NC03-63E9.3;POM121L1PC/C
rs48229321.4421.0451.978LOC100507657;MN1T/T
rs667864601.5821.1332.194LOC100507657;MN1T/T
rs622280823.511.56910.034NULLG/G
Unknow064951.8281.2582.66NULLT/T
rs1404283.7291.7429.705POM121L1PC/C
rs1404283.7291.7429.705POM121L1PC/C
rs38760452.91.3977.413POM121L1PC/C
rs38760463.5961.59710.313POM121L1PG/G
rs60031233.4241.489.959POM121L1PG/G
rs23300360.330.1210.941PRAMENP;VPREB1T/T
rs60035271.891.1283.355RAB36A/A
rs3954460.60.3860.949RANBP1;TRMT2AC/C
rs3954460.60.3860.949RANBP1;TRMT2AC/C
rs616435721.6811.0982.539TMEM211G/G
rs738791661.6811.0982.539TMEM211A/A
rs59938532.4461.1835.941TXNRD2C/C
rs1424450631.3781.0141.898ZNF280BA/A
rs20514881.3691.0081.886ZNF280BT/T

OR, odds ratio; lower, lower confidence interval; upper, upper confidence interval

Table 5

Low HDL-C related SNPs

SNPORlowerupperrefGeneHO
rs1326515.4431.66433.543APOL3C/C
rs1326535.5221.67134.152APOL3T/T
rs48214605.3821.62733.302APOL3G/G
rs1327185.3821.62733.302APOL4G/G
rs25222910.7160.5220.988CECR2C/C
rs1331190.6430.4510.927CRYBB2;IGLL3PC/C
rs6353611.6441.0382.722CRYBB2P1;GRK3G/G
rs353056661.4611.0452.078DERL3C/C
rs57600621.4481.0332.066DERL3G/G
rs284116852.0381.2553.513DGCR6L;LOC101927859A/A
rs65186041.8031.1413.007DGCR6L;LOC101927859A/A
rs9017902.0361.253.516DGCR6L;LOC101927859T/T
rs4436780.4430.2780.715DGCR8C/C
rs429280.6760.4840.948GAL3ST1T/T
rs40505062.1511.0245.533GGTLC2;MIR650T/T
rs48222801.8151.1643.14GGTLC2;MIR650A/A
rs10055580.7010.5310.924ISX;LINC01399A/A
rs4575602.5641.1876.707LL22NC03-63E9.3;POM121L1PC/C
rs9644652.5761.1746.801LL22NC03-63E9.3;POM121L1PC/C
rs96178762.1321.2653.798LOC101927859T/T
rs96178762.1321.2653.798LOC101927859T/T
rs57600121.4171.0132.014MMP11A/A
rs339100511.4931.0412.22NULLCCT/CCT
rs614080701.4521.0372.07NULLAC/AC
rs622280822.5911.2276.691NULLG/G
rs284378641.5781.1022.307POM121L1PT/T
rs38760451.9341.0134.3POM121L1PC/C
rs38760462.6441.2436.858POM121L1PG/G
rs60031232.481.1286.552POM121L1PG/G
rs3954460.5060.3250.799RANBP1;TRMT2AC/C
rs3954460.5060.3250.799RANBP1;TRMT2AC/C
rs106817861.4611.0452.078SMARCB1ATATCT/ATATCT
rs15732771.4481.0332.066SMARCB1C/C
rs20733921.4481.0332.066SMARCB1G/G
rs343784491.4521.0372.07SMARCB1G/G
rs57517401.4171.0132.014SMARCB1A/A
rs57517411.4521.0392.066SMARCB1A/A
rs57600381.441.032.049SMARCB1C/C
rs57600461.4731.0482.108SMARCB1A/A
rs57600461.4731.0482.108SMARCB1A/A
rs57600571.4691.0512.091SMARCB1C/C
rs59966201.4481.0332.066SMARCB1G/G
rs38273410.6470.4840.864SYN3T/T
rs1748770.3870.2380.641TANGO2C/C

OR, odds ratio; lower, lower confidence interval; upper, upper confidence interval

Table 6

Triglyceride level related SNPs

SNPORlowerupperrefGeneHO
rs1327592.0461.2273.621APOL2C/C
rs22838090.680.510.909CRYBB3T/T
rs20971951.9991.4112.89GGTLC2;MIR650C/C
rs48229321.4261.0561.919LOC100507657;MN1T/T
rs667864601.4081.0261.921LOC100507657;MN1T/T
rs60048650.6470.4550.904MYO18BC/C
rs2008521941.4971.0182.262NULLG/G
rs1397261.5571.1982.035SGSM1A/A
rs1397281.4891.1521.935SGSM1G/G
rs1748770.6040.3760.983TANGO2C/C

OR, odds ratio; lower, lower confidence interval; upper, upper confidence interval

Significant SNPs and odds ratio HO_CI, homozygous confidence interval; HE_CI, heterozygous confidence interval P values are calculated from chi square test Hypertension related SNPs OR, odds ratio; lower, lower confidence interval; upper, upper confidence interval Diabetes mellitus related SNPs OR, odds ratio; lower, lower confidence interval; upper, upper confidence interval Low HDL-C related SNPs OR, odds ratio; lower, lower confidence interval; upper, upper confidence interval Triglyceride level related SNPs OR, odds ratio; lower, lower confidence interval; upper, upper confidence interval

Gene based prediction model

We applied different machine learning models including logistic regression, random forest, adaboost and neural network to predict metabolic syndrome which is based on gene data. Using our four predicting models (logistic regression, random forest, adaboost and neural network), AUC were 0.68, 0.8, 0.82, 0.8, respectively. The F1 score were 0.424, 0.525, 0.528, 0.526 respectively (for details see Table 7). We chose 40 most significant SNPs in random forest model and used them as the new variable. We compared the 40 most significant OR value with the 40 most important SNPs in random forest model. We found that there are only 11 SNPs overlapping (Table 8) The SNP selected models ((logistic regression, adaboost and neural network) AUC were 0.82, 0.81, 0.85 respectively. The F1 score were 0.578, 0.415, 0.5, respectively (Table 9). Feature selecting models had better performance than original models. The AUC and F1 value are better than previous model.
Table 7

Prediction model using all significant SNPs

AUCSensSpecPrecF1
logistic0.680.740.5860.2970.424
random forest0.80.6750.7880.430.525
adaboost0.820.7640.7320.4030.528
Neural network0.80.7480.740.4050.526

AUC, area under curve; Sens, sensitivity; Spec, specificity; Prec, precision value; F1, F1 score

Table 8

40 most important SNPs in random forest model and OR value

RF_SNPOR_SNP
rs4006261rs4050506
rs60580698rs2904924
rs9612154rs132653
rs66786460rs132651
rs9605406rs4821460
rs56782074rs4822280
rs11430762rs455194
rs174877rs56782074
rs2857463rs61643572
rs133122rs73879166
rs2283809rs2857463
rs2331158rs6003123
rs35251008rs12484632
rs9606328rs3876045
rs469995rs6004865
rs34262500rs132650
rs6003230rs34262500
rs377976rs35041494
rs61643572rs132718
rs3950040rs2330036
rs5756977rs3950040
Unknow06495rs2269635
rs5998659rs132665
rs73879166rs964465
rs131837rs3876046
rs2254747rs11430762
rs5748561rs457560
rs2330036rs5746814
rs4822689rs8143055
rs1153417rs62228082
rs2097195rs4821508
rs2269635rs5759468
rs2522291rs132759
rs17209532rs2013924
rs9944250rs759609
rs737855rs4821506
rs5746814rs2283981
rs28437864rs60580698
rs1059142rs5751643
rs4822932rs4820531

RF_SNP, Random forest model 40 most important SNP; OR_SNP, 40 most important SNPs according to odds ratio value

Table 9

Prediction model using feature selecting SNPs

AUCSensSpecPrecF1
Feature selectionrandomforest 40 most important SNPs
logistic0.820.6340.890.5780.605
adaboost0.810.7720.7420.4150.54
Neural network0.850.6990.8340.50.583

AUC, area under curve; Sens, sensitivity; Spec, specificity; Prec, precision value; F1, F1 score

Prediction model using all significant SNPs AUC, area under curve; Sens, sensitivity; Spec, specificity; Prec, precision value; F1, F1 score 40 most important SNPs in random forest model and OR value RF_SNP, Random forest model 40 most important SNP; OR_SNP, 40 most important SNPs according to odds ratio value Prediction model using feature selecting SNPs AUC, area under curve; Sens, sensitivity; Spec, specificity; Prec, precision value; F1, F1 score

Discussion

In this study, we found 186 circadian gene SNPs related to metabolic syndrome. Of that there were 8 SNPs related to apolipoprotein. Previous studies have shown that apolipoprotein E knocked out mice will be more likely to developed cardiovascular disease after circadian rhythm was interrupted [31, 32]. Circadian rhythm disorders can alter our body’s metabolic factors including cholesterol profile and apolipoprotein [33]. Another animal study also found that apolipoprotein-E knocked out mice could develop cardiac vascular disease more rapidly after circadian rhythm alteration [34]. Our study also showed that apolipoprotein is related to high TG level, low HDL level and HTN. Rs132759 in APOL2 is both correlated with HTN and low HDL level. Previous studies have shown that APOL2 may be related to acute inflammation response and lipid metabolic processes [35, 36]. To our knowledge, our study is the first to identify that APOL2 is correlated to HTN. There are 5 SNPs located at BMS1P20 which are long non-coding RNAs (lnc RNA). Previous studies have shown that BMS1P20 is positively corelated to cancer patients’ overall survival especially lung adenocarcinoma [37]. There is also a hypothesis where lnc-RNA regulates our cell by lncRNA-miRNA-mRNA ceRNA network [38]. There are some lnc-RNA reported to be in correlation with metabolism like 116HG, H19, HOTAIR and MIAT [39-41]. We have found rs403517 and rs405570 in BMS1P20 is related to DM, and we believe our study is the first to report BMS1P20 lnc-RNA is related to metabolic syndrome. MYO18B gene expresses myosin heavy chain that is expressed in human cardiac and skeletal muscle [42]. Some studies showed that MYO18B mutation is associated with myopathy or cardiomyopathy diseases in animal model or in humans [43, 44]. One animal study also show that MYO18B gene expression is regulated by circadian rhythm [45]. In our study, we find that MYO18B is also associated with metabolic syndrome especially rs6004865 which is associated with low HDL levels. Although the SNPs which we find in MYO18B are all intronic or intergenic, we still need more studies to find the relationship between MYO18B and metabolic syndrome. There are many studies exploring the RORA gene and its relation to circadian rhythm, associated with many psychiatry disorders including major depressive disorder, bipolar disorder, or sleep disturbance disorder [46-48]. RORA gene mutations also affect substance use like alcohol, tea, tobacco or caffeine [47]. This is on a background of the widely accepted knowledge that smoking and alcohol. consumption will increase the risk of developing metabolic syndrome. The result of an animal system study sees that suppression of RORA gene activity improves metabolic functions and reduces inflammation [49]. Many studies have found that SMARCB1 is a tumor suppressor gene and related to different types of cancer [50]. Recent studies have shown that the circadian clock oscillation was developed during cell differentiation and some cancer cells lack the circadian gene which given the similarity between embryonic stem cell and cancer cell types [51]. Our study found that multiple SNPs in SMARCB1 gene (rs5751740, rs5751741, rs5760038, rs5760046, rs5760057, rs5996620) are both related to high TG level and hypertension. However, the definite mechanism is still unknown. ZNF280B is an oncogene in the prostate cancer and gastric cancer [52]. Our study is the first to point out that ZNF280B mutation is related to metabolic syndrome. Rs142445063 and rs2051488 are related with diabetes mellitus in our study. A previous study has used different machine learning method to predict metabolic syndrome. Both clinical information and genetic information were included in the model [53]. In our study, entire dataset or selected SNPs were chosen in different models. The accuracy, AUC value and F1 value were improved in SNPs selected model. Previous studies have showed that feature selection model will have a better performance [54]. The advantage of this study is as follows. First, we examined multiple circadian genes and found multiple SNPs associated with metabolic syndrome. Some SNPs were first found related to metabolic syndrome. Among the significant SNPs, we did subgroup analysis to find out which SNPs corresponds to different metabolic syndrome criteria. Second, based on genetic information; we used four machine learning model to predict metabolic syndrome which to our knowledge has never been performed in previous studies and the AUC value can achieve 0.85 in SNPs selected model. Nevertheless, there are several limitations in our study. First, the sample size is small and only includes healthy and aware Taiwanese participants. Therefore, this study should be replicated and validated in other populations. Second, this was a cross sectional study. It is difficult for us to find out causal relationships in this study. Third, we only used circadian gene SNPs in our prediction model. Other metabolic syndrome related SNPs or biomarkers can be included to increase accuracy.

Conclusion

We identified 186 circadian gene SNPs which were related to metabolic syndrome. Among these SNPs, there are 47 alleles associated with hypertension, 46 alleles associated with high serum TG levels, 27 alleles associated with diabetes mellitus and 10 alleles associated with low serum HDL levels. Some SNPs are first found to related with metabolic syndrome. Additional research is needed to confirm these SNPs. In addition, we applied several machine learning models to predict metabolic syndrome based on circadian gene data. We found that it is difficult to produce a high sensitivity model. Other clinical data should be added in to create a higher sensitivity model (Additional files 1, 2, 3, 4, 5, 6, 7, 8). Additional file 1: Table S1. Summary of the 186 significant circadian gene SNPs. Additional file 2: Supplementary figure S2 AUC curve of neural network Additional file 3: Supplementary figure S3 Precision-Recall curve ofneural network Additional file 4: Supplementary figure S4 AUC curve of Adaboost model Additional file 5: Supplementary figure S5 Precision-Recall curve of Adaboost model Additional file 6: Supplementary figure S6 AUC curve of logisticregression Additional file 7: Supplementary figure S7 Precision-Recall curve of logistic regression Additional file 8: Supplementary figure S8 Biological pathways-based analysis of circadian rhythm(1)
Reference
1. Reactome
  52 in total

Review 1.  Epidemiology of obesity, the metabolic syndrome, and chronic kidney disease.

Authors:  Rikki M Tanner; Todd M Brown; Paul Muntner
Journal:  Curr Hypertens Rep       Date:  2012-04       Impact factor: 5.369

2.  Identification of human circadian genes based on time course gene expression profiles by using a deep learning method.

Authors:  Peng Cui; Tingyan Zhong; Zhuo Wang; Tao Wang; Hongyu Zhao; Chenglin Liu; Hui Lu
Journal:  Biochim Biophys Acta Mol Basis Dis       Date:  2017-12-12       Impact factor: 5.187

Review 3.  Circadian Rhythms in Diet-Induced Obesity.

Authors:  Atilla Engin
Journal:  Adv Exp Med Biol       Date:  2017       Impact factor: 2.622

4.  Pro-inflammatory genes as biomarkers and therapeutic targets in oral squamous cell carcinoma.

Authors:  Shailaja K Rao; Zoran Pavicevic; Ziyun Du; Jong-Gwan Kim; Meiyun Fan; Yan Jiao; Molly Rosebush; Sandeep Samant; Weikuan Gu; Lawrence M Pfeffer; Christopher A Nosrat
Journal:  J Biol Chem       Date:  2010-08-11       Impact factor: 5.157

Review 5.  Metabolic syndrome.

Authors:  Susan L Samson; Alan J Garber
Journal:  Endocrinol Metab Clin North Am       Date:  2014-03       Impact factor: 4.741

6.  ZNF280B promotes the growth of gastric cancer in vitro and in vivo.

Authors:  Jingming Zhai; Zheng Yang; Xiaodong Cai; Guoliang Yao; Yanhui An; Wei Wang; Yonggang Fan; Chao Zeng; Kefeng Liu
Journal:  Oncol Lett       Date:  2018-02-15       Impact factor: 2.967

Review 7.  Circadian clock genes and the transcriptional architecture of the clock mechanism.

Authors:  Kimberly H Cox; Joseph S Takahashi
Journal:  J Mol Endocrinol       Date:  2019-11       Impact factor: 5.098

Review 8.  Sugar-sweetened beverages and risk of metabolic syndrome and type 2 diabetes: a meta-analysis.

Authors:  Vasanti S Malik; Barry M Popkin; George A Bray; Jean-Pierre Després; Walter C Willett; Frank B Hu
Journal:  Diabetes Care       Date:  2010-08-06       Impact factor: 19.112

9.  Increased risk of chronic fatigue syndrome in patients with inflammatory bowel disease: a population-based retrospective cohort study.

Authors:  Shin-Yi Tsai; Hsuan-Ju Chen; Chon-Fu Lio; Chien-Feng Kuo; An-Chun Kao; Wei-Shieng Wang; Wei-Cheng Yao; Chi Chen; Tse-Yen Yang
Journal:  J Transl Med       Date:  2019-02-22       Impact factor: 5.531

10.  Aberrant functional connectivity between the suprachiasmatic nucleus and the superior temporal gyrus: Bridging RORA gene polymorphism with diurnal mood variation in major depressive disorder.

Authors:  Zhilu Chen; Shiwan Tao; Rongxin Zhu; Shui Tian; Yurong Sun; Huan Wang; Rui Yan; Junneng Shao; Yujie Zhang; Jie Zhang; Zhijian Yao; Qing Lu
Journal:  J Psychiatr Res       Date:  2020-10-04       Impact factor: 4.791

View more
  1 in total

1.  A Web-Based Model to Predict a Neurological Disorder Using ANN.

Authors:  Abdulwahab Ali Almazroi; Hitham Alamin; Radhakrishnan Sujatha; Noor Zaman Jhanjhi
Journal:  Healthcare (Basel)       Date:  2022-08-05
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.