Literature DB >> 25848169

Application of centrality measures in the identification of critical genes in diabetes mellitus.

Chintagunta Ambedkar¹, Kiran Kumar Reddi², Naresh Babu Muppalaneni³, Duggineni Kalyani³.

Abstract

The connectivity of a protein and its structure is related to its functional properties. Many experimental approaches have been employed for the identification of Diabetes Mellitus (DM) associated candidate genes. Therefore, it is of interest to use var ious graph centrality measures integrated with the genes associated with the human Diabetes Mellitus network for the identification of potential targets. We used 2728 genes known to cause Diabetes Mellitus from Jensenlab (Novo Nordisk Foundation Center for Protein Research, Denmark) for this analysis. A protein-protein interaction network was further constructed using a tool Centralities in Biological Networks (CentiBiN) with 1020 nodes after eliminating the duplicates, parallel edges, self -loop edges and unknown Human Protein Reference Database (HPRD) IDS. We used fourteen centralities measures which are useful in identifying the structural characteristic of individuals in the network. The results of the centrality measures are highly correlated. Thus, we identified genes that are critically associated with DM. We further report the top ten genes of all fourteen centrality measures for further consideration as targets for DM.

Entities: Chemical Disease Gene Species

Year: 2015 PMID： 25848169 PMCID： PMC4369684 DOI： 10.6026/97320630011090

Source DB: PubMed Journal: Bioinformation ISSN： 0973-2063

Background

People are facing major life threatening disease like diabetes, cancer, hyper tension, heart disease and stroke [1]. We have chosen Diabetes Mellitus for our study. Diabetes Mellitus is a group of metabolic diseases characterized by hyperglycemia resulting from defects in insulin secretion, insulin action, or both. The chronic hyperglycemia of diabetes is associated with long-term damage, dysfunction, and failure of various organs, especially the eyes, kidneys, nerves, heart, and blood vessels. As the risk of cardiovascular disease is much higher for a diabetic, it is crucial that blood pressure and cholesterol levels are monitored regularly [2]. Diabetes Mellitus is not a single disease but a group of disorders with glucose intolerance in common. Many online databases are available to research genes across species. Different databases available that allows access to information about phenotypes, pathways, and variations of many genes across species. Before the candidate-gene approach was fully developed, various other methods were used to identify genes linked to disease-states. However, these methods are not as beneficial when studying complex diseases for several reasons. In this scenario, candidate gene approaches were found in identifying the risk variants associated with various diseases of interest such as dementia, cancer, diabetes, asthma, and hypertension. The candidate gene approach to conducting genetic association studies focuses on associations between genetic variation within prespecified genes of interest and phenotypes or disease states. [3, 4] With the tremendous escalation of human protein interaction data, the entanglement of the techniques can be conquered through protein–protein interaction networks (PPINs). The function and activity of a protein are often modulated by other proteins with which it interacts [5, 6]. Data might be represented as networks, in which the vertices (e.g. transcripts, proteins or metabolites) are linked by edges (correlations, interactions or reactions, respectively). Structural analysis of networks can lead to new insights into biological systems and is a helpful method for proposing new hypotheses [7-10].

Methodology

Proteins are the representatives of the biological networks and they are realized only if the relationship between essentiality and topological properties such as the degree distribution, clustering coefficients, centrality measures, and community structures of the network are studied [9]. Network centralities are used to rank elements of a network according to a given importance concept [11]. However, the use of centralities as a structural analysis method for biological networks is controversial and several centrality measures should be considered within an exploratory process [16]. To support such analysis and due to the complexity of both biological networks and centrality calculations, a tool is needed to facilitate these investigations. Here we present CentiBin, an application for the calculation and visualization of centralities for biological networks. The human protein interaction data was obtained from Human Protein Reference Database (HPRD). The main purpose of using HPRD dataset is it focuses on likely true Protein-Protein Interaction (PPI) set by generating sub networks around proteins of interest. HPRD represents a centralized platform to visually depict and integrate information pertaining to domain architecture, posttranslational modifications, interaction networks and disease association for each protein in the human proteome [17]. We have followed the procedure mentioned in Figure 1 for identifying the critical genes for diabetes mellitus.

Figure 1

Flow Chart

Data Set:

We have extracted the human gene involving in Diabetes mellitus from the database developed by Jensen Group (Jensenlab) of Novo Nordisk Foundation Center for Protein Research, Denmark. Jensenlab is maintaining a DISEASES database. DISEASES database is a frequently updated web resource that integrates evidence on disease-gene associations from automatic text mining, manually curated literature, cancer mutation data, and genome-wide association studies. We have mined the Jensenlab DISEASES database for the genes causing Diabetes mellitus. We got that 2728 genes causing diabetes mellitus, after eliminating duplicate entries it reduced to 2017 genes.

Network Properties and Centrality Measures:

Here we have calculated fourteen different graph centrality measures such as degree, eccentricity, closeness, radiality, centroid values, Stress, shortest-path betweenness, currentflow closeness, current-flow betweenness, Katz status index, Eigen vector, hits-authority, hits-hubs and Page Rank using the tool CentiBin and are defined as follows [12–13, 15,16, 18–32]. Degree Eccentricity Closeness Radiality Stress Shortest path Betweenness Shortest path closeness pνt(t) equals the potential difference. Katz status index Eigen Vector#x00009; λCIV=ACIV Centroid#x00009; Ccen(ν) = min⁡{f(ν,w):ν{ν}} Where f (v, w) = γ v (w) − γ w (v) and γ v (w) denotes the number of vertices that are closer to v than to w. Page Rank#x00009;#x00009;#x00009;#x00009;Cpr = dpCpr + (1 − d)1→ Where P is the transition matrix and d is the damping factor. Betweenness#x00009;#x00009;#x00009;#x00009; Where Tst(v) equals the fraction of electrical current running over vertex v in a network HITS-Hubs#x00009;#x00009;#x00009;#x00009;Chubs = ACaut hs Hits-authority#x00009;#x00009;#x00009;#x00009;Caut hs = ATChubs

Correlation analysis of centrality measures:

Correlation is a statistical technique that can show whether and how strongly pairs of variables are related. The fourteen different centrality measures were calculated for each and every node in the interact and ranked based on their scores. Pair wise correlation between the various centrality measures was obtained through Spearman's rank correlation coefficient ρ which is defined as

Result & Discussion

With the help of bioDBnet (http://biodbnet.abcc.ncifcrf.gov) we find the equivalent HPRD ids for the 2017 genes. Out of 2017 proteins we got HPRD IDs for 1834 proteins. We are unable to get the equivalent HPRD IDs for 183 proteins, so we find the HPRD IDs through their aliases. Still we couldn׳t find the HPRD ids for 39 proteins because these are the new entries in the database. After eliminating duplicates we got 1876 unique genes.

PPI Network:

To construct the Protein-Protein Interaction network, we have downloaded and deployed the interactions database from HPRD website (http://hprd.org) in our local database. We have retrieved the PPIs for 1876 unique proteins where both source and sink proteins are in 1876 unique proteins. With that we have constructed a network using CentiBin with 1151 vertices and 3389 edges. Finally we got 1020 vertices with 2891 edges after eliminating the self edges, parallel edges from the network. Using CentiBin we have calculated fourteen different graph centrality measures such as degree, eccentricity, closeness, radiality, centroid values, Stress, shortest-path betweenness, current-flow closeness, current-flow betweenness, Katz status index, Eigen vector, hits-authority, hits-hubs and Page Rank for the PPI network constructed. The top ten genes of each centrality measure are presented in Table 1 (see supplementary material).

Correlation analysis on centrality properties:

The pair wise correlation coefficients of the fourteen centrality measures depicted for the Diabetes Mellitus elucidate that they all are positively correlated and their correlation value lies above 0.52 as represented in Table 2 (see supplementary material), Figure 2. Here, the difference di represents the difference in the ranks of each observation on the two variables which here represents the centrality scores.

Figure 2

correlation among of different pairs of centrality measures for the diabetes mellitus genes whose correlation coefficient is above 0.9 (a) Degree vs Centroid (b) degree vs Katz status index (c) Degree vs page rank (d) sp betweeness vs Stress (e) sp betweeness vs page rank (f) cf betweeness vs page rank (g) Katz status index vs radiality (h) cf closeness vs degree (i) Eigen Vector vs hits authority (j) hits authority vs hits hubs (k) cf betweeness vs Stress (l) closeness vs eigen vector (m) closeness vs hits-hubs (n) closeness vs hits-authority (o) closeness vs katz status index.

Conclusion

Many experimental approaches have been used to identify candidate genes in DM. We used various graph centrality measures integrated with the genes to identify potential drug targets. We calculated fourteen centralities measures for the constructed network with positive correlation having values greater than 0.52. This helped to identify genes that are highly critical in DM. We thus report the top 10 genes of all fourteen centralities for consideration as potential targets for DM.

22 in total

1. The small world of metabolism.

Authors: D A Fell; A Wagner
Journal: Nat Biotechnol Date: 2000-11 Impact factor: 54.908

2. Network motifs: simple building blocks of complex networks.

Authors: R Milo; S Shen-Orr; S Itzkovitz; N Kashtan; D Chklovskii; U Alon
Journal: Science Date: 2002-10-25 Impact factor: 47.728

Review 3. Protein interactions and disease: computational approaches to uncover the etiology of diseases.

Authors: Maricel G Kann
Journal: Brief Bioinform Date: 2007-07-16 Impact factor: 11.622

4. The human disease network.

Authors: Kwang-Il Goh; Michael E Cusick; David Valle; Barton Childs; Marc Vidal; Albert-László Barabási
Journal: Proc Natl Acad Sci U S A Date: 2007-05-14 Impact factor: 11.205

5. Identification of synthetic lethal pairs in biological systems through network information centrality.

Authors: T Kranthi; S B Rao; P Manimaran
Journal: Mol Biosyst Date: 2013-06-03

Review 6. Network medicine: a network-based approach to human disease.

Authors: Albert-László Barabási; Natali Gulbahce; Joseph Loscalzo
Journal: Nat Rev Genet Date: 2011-01 Impact factor: 53.242

7. Occupational lifestyle diseases: An emerging issue.

Authors: Mukesh Sharma; P K Majumdar
Journal: Indian J Occup Environ Med Date: 2009-12

8. Classification and diagnosis of diabetes mellitus and other categories of glucose intolerance. National Diabetes Data Group.

Authors:
Journal: Diabetes Date: 1979-12 Impact factor: 9.461