Literature DB >> 29545837

Identification of dysregulated modules based on network entropy in type 1 diabetes.

Abstract

Type 1 diabetes is a prevalent autoimmune disease of which the underlying mechanisms remain to be elucidated. The aim of the study was to identify dysregulated modules of type 1 diabetes. After microarray data were preprocessed, 20,545 genes were obtained. By integrating gene expression data and protein-protein interactions (PPI) data, 48,778 new networks were obtained, including 7,953 genes. After simplifying networks, we obtained 24 target networks. By ranking networks with P-values, two modules with P<0.05 were identified, including the genes, CCNB1, CDC45, GINS2, NDC80, FBXO5, NCAPG and DLGAP5. Module 2 was part of module 1. The identified modules and genes may provide new insights into the underlying biological mechanisms that drive the progression of type 1 diabetes.

Entities: Chemical Disease Gene Mutation Species

Keywords: dysregulated module; entropy; network; protein-protein interaction; type 1 diabetes

Year: 2018 PMID： 29545837 PMCID： PMC5841047 DOI： 10.3892/etm.2018.5803

Source DB: PubMed Journal: Exp Ther Med ISSN： 1792-0981 Impact factor: 2.447

Introduction

Type 1 diabetes is an autoimmune disease characterized by the T cell-mediated destruction of insulin-producing β-cells in the islets of Langerhans (1). Type 1 diabetes is one of the most common chronic diseases of childhood (2), particularly in boys (3). Despite recent broad organisational, intellectual and fiscal investments, there is no valid method to prevent or cure type 1 diabetes. Therefore, elucidating the mechanisms of type 1 diabetes is critical for the clinical diagnosis and treatment for type 1 diabetes. Thus the aim of the present study was to explore molecular mechanisms of type 1 diabetes. Determinants of diabetes pathology are complex, including environment factors and genetic factors. It is generally accepted that environmental agents initiated the pathologic process in type 1 diabetes, as many cases are diagnosed in autumn and winter (4). Birth during spring is also associated with a higher chance of having type 1 diabetes (5). Efforts have been made on modules to describe the influence of environment on type 1 diabetes, including the gut microbiome (6) and hygiene hypothesis (3). However, no specific agents have been identified with an unequivocal influence on pathogenesis. Type 1 diabetes is clearly a polygenic disorder, with 50 susceptibility regions having been identified (3), of which the human leukocyte antigen (7) region on chromosome 6 potentially provides half of the genetic susceptibility, especially HLA class II alleles (3). Most of the associated loci are thought to be involved in immune responses (8). According to the literature (3), the associated SNPs are localized to enhancer sequences active in thymus, T and B cells, and CD34+ stem cells. Although there has been considerable research on type 1 diabetes progression, the data are huge and complex. A network-based approach was suggested as a powerful tool for studying the complex behavior of biological systems (3). To elucidate the molecular mechanisms of type 1 diabetes, we introduced a new method to screen differential modules between the disease and normal groups. We downloaded gene expression data of type 1 diabetes from the Array Express database. By combining gene expression data and protein-protein interactions (PPI) data, we constructed target networks. Local entropy and global entropy of network were calculated to screen differential modules between diabetes and normal group.

Materials and methods

Gene expression data

Microarray data of E-GEOD-10586 (3), along with its annotation files, was downloaded from the Array Express database. The data included 12 diabetes patients and 15 healthy controls. The platform in the present study was A-AFFY-44-Affymetrix GeneChip Human Genome U133 Plus 2.0 [HG-U133_Plus_2].

Data preprocessing

Microarray data were preprocessed as follows. To eliminate the influences of non-specific hybridization, background was corrected with robust multichip average (RMA) (3). After data were normalized with ‘quantiles (9), perfect match (3)/mismatch (MM) correction was conducted using the MAS method (10), and median polish was conducted for summarizing data (11). Microarray data were then transformed into an expression set. According to the gene ID and symbol in the annotation file of the platform, the gene ID was changed to its probe ID. Finally, the expression profiles with 20,545 genes were obtained.

PPI networks construction

Human-associated PPI data were downloaded from Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database (12), including 16,730 genes and 1,048,576 pairs of interactions. Protein ID was converted to a gene symbol. Self-loops and proteins without expression value were removed. The combine-score was used to examine the relationship between two genes. To selected PPIs with closer relationship, we set the criteria of combined score ≥0.8, generating a new PPI network, including 8,590 nodes and 53,975 edges. Gene interactions in the PPI network were reweighed using the Pearsons correlation coefficient. The absolute value of each interaction was considered as the interactive-score. Finally, PPI networks of the normal and diabetes group were calculated using the interactive-scores.

Comparison and identification of modules

To construct networks in the disease and normal groups, we applied a module-identification algorithm, which is based on clique-merging according to Srihari and Ragan (13). The algorithm calculations included two steps: Firstly, it finds all maximal cliques from the PPI networks of the normal and diabetes groups. Maximal cliques (26,580) were found in both groups and ranked in non-increasing order of their weighted interaction densities. Secondly, the cliques were ranked according to their weighted interaction density (3) and merged or removed highly overlapping cliques The score of a clique C was defined as its weighted interaction density, where ω (i, j) indicates the weight of the interaction between i and j calculated using fast depth-first method (14). In total 8,002 maximal cliques were identified in a PPI network, and the overlapped cliques should be removed. The inter-connectivity between two cliques was used to determine whether two overlapped cliques should be merged together. The inter-connectivity between the non-overlapping proteins of C1 and C2 was calculated as Given a set of cliques ranked in descending order of their score, denoted as {C1, C2,…, Ck}, clustering based on the maximal cliques (CMC) algorithm was removed and merged highly overlapped cliques as follows. For every clique Ci, if there existed a clique Cj such that Cj had a lower score than Ci and where overlap-threshold was a predefined threshold for overlapping. Subsequently, the weighted interconnecting score of different nodes in the two cliques was calculated. If such Cj existed, then the interconnectivity score between Ci and Cj was used to decide whether to remove Cj or merge Cj with Ci. If inter-score (Ci, Cj) and merge-threshold (tm) existed, then Cj was merged with Ci to form a module; otherwise, Cj was removed. In this study, the overlap-threshold was set to 0.5 and merge-threshold was set to 0.25.

Identification of differential modules

To identify differential modules between disease group and normal group, we constructed target networks and performed Wilcoxon rank sum test (15).

Comparing modules across condition

To search for similar or the same modules between the normal and diabetes groups, module correlation densities for modules were calculated. Let S = {S1, S2, …, Sn} and T = {T1, T2, …, Tm} be the sets of modules identified from the normal and disease networks, respectively. For each Si Є S, module correlation densities were calculated as: where (p,q) is a protein pair, PCC((p,q), N) is the Pearsons correlation of (p,q) under normal conditions, and Si is the i-th modules identified from networks. The correlation densities for disease modules were calculated similarly. After all the modules were examined, 69 pairs of similar or identical modules were identified.

Construction of target network

Shared genes and interactions in the normal and disease modules were reserved, generating a new network, designated as target network. Finally, 24 target networks were identified. To compare the network features of the target networks, network entropy was conducted in this study (16). The local network entropy of a node i, denoted Si, is defined as where ki is the degree of node i, N(i) is the set of neighbor nodes of node i and pij defines a stochastic probability matrix on the network, which is defined by where cij is the Pearsons correlation coefficient (PCC) between protein i and protein j. The global network entropy, denoted S, was defined as follows: where n is the total number of nodes in the network, and Ci is the degree centrality of node i. The differential network entropy was defined as follows: where SIi SNi is the local network entropy of node i in the disease and normal networks, respectively.

Significant test

To determine whether the distributions of local network entropy of the disease and normal networks were significantly different, we performed the non-parametric one-tailed Wilcoxon rank sum test (15). The disease sample labels were permuted and global entropy of networks in the disease and normal groups were recalculated. This process was repeated L times. P-value of the test was used as a measure of the degree of difference between the values in the two networks. P-value was calculated as: P<0.05 was considered to indicate a statistically significant difference.

Results

After data preprocessing, 20,545 genes were obtained. Proteins from PPI networks were transformed to a gene symbol. Based on the criteria of combined score ≥0.8, 53,975 interactions and 8,590 nodes were obtained. The interactions between genes and PPI networks were investigated. The interactions existing in the PPI and gene expression data were reserved. In total, 48,778 new PPI interactions, including 7,953 genes were obtained.

Identifying dysregulated module

Modules constituted shared genes in the disease and normal groups were regarded as target networks. In total, 24 networks were obtained. Global entropy of networks was calculated from local entropy of nodes. After significance test, P-values of networks were obtained. Two significant differential modules were identified with P<0.05 (Table I). Module 1 was constructed with 7 genes and 21 interactions (Fig. 1). Each gene interacted with other genes. Module 2 was constructed with 4 genes and 6 interactions (Fig. 2). Module 2 was part of module 1, as the 4 genes, NDC80, FBXO5, NCAPG and DLGAP5, were identified in module 1.

Table I.

Two modules with P<0.05 were identified.

Module	ΔS	P-value
1	0.2429241	0.043
2	0.1019656	0.025

ΔS indicates differential network entropy between normal and disease group.

Figure 1.

Dysregulated module 1.

Figure 2.

Dysregulated module 2.

Discussion

In this study, by integrating gene expression data and PPI, we identified 48,778 new PPI interactions, including 7,953 genes. In the network analysis, 24 target modules were identified. In the entropy analysis, two differential modules between type 1 diabetes group and normal group were obtained, and module 2 was part of module 1. Therefore, module 1 which was constructed with 7 genes, including CCNB1, CDC45, GINS2, NDC80, FBXO5, NCAPG and DLGAP5 was the most significant module. It was suggested to help understanding the mechanism of type 1 diabetes. This method is based on network entropy, which performs better than other network metrics in characterizing the inflammatory network as proposed by Jin et al (16). CCNB1 is a gene expressing a regulatory protein, cyclin B1, which forms a complex with p34 (Cdk1) to form the maturation-promoting factor (MPF). Once activated by dephosphorylation by the phosphatase Cdc25, the complex promotes several events of early mitosis (18). It has been found that CCNB1 was significantly upregulated in non-obese diabetic mesenchymal stem cells and proposed that genetic variants in CCNB1 were associated with increased reporter gene expression through binding of transcription factors nuclear factor-Y, which elevated fasting plasma glucose in humans (3). By contrast, in the non-obese diabetic mouse study, NDC80, CCNB1, FBXO5, NCAPG and CDC45 (19) were involved in cell cycle, which promoted the development of type 1 diabetes mellitus (3). Although no evidence showed that GINS2 was correlated with diabetes, its expression was downregulated by high glucose in retinal pigment epithelial cell lines (3). NDC80 encodes a component of the NDC80 kinetochore complex, which functions to organize and stabilize microtubule-kinetochore interactions and is required for proper chromosome segregation (NCBI Gene Database). It was presented that the NUF2 gene, which also encodes a component of the NDC80 kinetochore complex, was upregulated in diabetes HUVEC compared with normal HUVEC (3). Thus NDC80 may play a similar role in diabetes. FBXO5 encodes a member of the F-box protein family. FBXO5 was detected to duplicate in chromosome observed from 15 patients with Mayer-Rokitansky-Kuster-Hauser syndrome (3). By contrast, diabetes has been reported to cause malformations of Mullerian ducts in females (20). Therefore, we suggested that FBXO5 may also function in diabetes. NCAPG encodes a component of condensin I, which is a large protein complex involved in chromosome condensation. Several single nucleotide polymorphisms (SNPs) near the gene of NCAPG were associated with type 2 diabetes (3). However, whether the gene plays a key role in type 1 diabetes still needs further study. DLGAP5 encodes a kinetochore protein that stabilizes microtubules in the vicinity of chromosomes. In adrenocortical tumors, DLGAP5 was identified as a diagnostic marker since it was differentially expressed between recurring and non-recurring adrenocortical tumors (3). However, in diabetes, no studies have shown the functions of DLGAP5. In conclusion, one dysregulated module was identified using the network-based entropy analysis, which was considered to play a key role in type 1 diabetes progression. It is suggested that this module may function as a therapeutic indicator for type 1 diabetes. Nevertheless, there are limitations to the present study. The sample size was not large enough to affect the conclusions to some degree. Additionally, the results need more clinical evidence for further validation.

Competing interests

The authors declare that they have no competing interests.

19 in total

1. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias.

Authors: B M Bolstad; R A Irizarry; M Astrand; T P Speed
Journal: Bioinformatics Date: 2003-01-22 Impact factor: 6.937

Review 2. Type 1 diabetes in the young: the harvest of sorrow goes on.

Authors: E A M Gale
Journal: Diabetologia Date: 2005-08 Impact factor: 10.122

Review 3. Forkhead transcription factors: key players in development and metabolism.

Authors: Peter Carlsson; Margit Mahlapuu
Journal: Dev Biol Date: 2002-10-01 Impact factor: 3.582

Review 4. Genetics of type 1A diabetes.

Authors: Patrick Concannon; Stephen S Rich; Gerald T Nepom
Journal: N Engl J Med Date: 2009-04-16 Impact factor: 91.245