Literature DB >> 20431808

Cluster analysis and phylogenetic relationship in biomarker identification of type 2 diabetes and nephropathy.

Satya Vani Guttula¹, Allam Appa Rao, G R Sridhar, M S Chakravarthy, Kunjum Nageshwararo, Paturi V Rao.

Abstract

Cluster analysis of DNA microarray data that uses statistical algorithms to arrange the genes according to similarity in patterns of gene expression and the output displayed graphically is described in this article. Hierarchical clustering is a multivariate tool often used in phylogenetics, comparative genomics to relate the evolution of species. The patterns seen in microarray expression data can be interpreted as indications of the status of the genes responsible for nephropathy in peripheral blow cells of type 2 diabetes (T2DN). Out of 415 genes totally expressed in the 3 DNA chips it was concluded that only 116 genes expressed in T2DN and in that only 50 are functional genes. These 50 functional genes are responsible for diabetic nephropathy; of these 50, some of the genes which are more expressed and responsible are AGXT: Alanine-glyoxylate aminotransferase, RHOD: Ras homolog gene family, CAPN6: Calpain 6, EFNB2: Ephrin-B2, ANXA7: Annexin A7, PEG10: Paternally expressed 10, DPP4: Dipeptidyl-peptidase 4 (CD26, adenosine deaminase complexing protein 2), ENSA: Endosulfine alpha, IGFBP2: Insulin-like growth factor binding protein 2, 36kDa, CENPB: Centromere protein B, 80kDa, MLL3: Myeloid/lymphoid or mixed-lineage leukemia 3, BDNF: Brain-derived neurotrophic factor, EIF4A2: Eukaryotic translation initiation factor 4A, isoform 2, PPP2R1A: Protein phosphatase 2 (formerly 2A), regulatory subunit A, alpha isoform. Fifty genes and their nucleotide sequences are taken from NCBI and a phylogenetic tree is constructed using CLUSTAL W and the distances are closer to each other concluding that based on the sequence similarity and evolution the genes are expressed similarly. Literature survey is done for each gene in OMIM and the genes responsible for diabetic nephropathy are listed.

Entities: Disease Gene Species

Keywords: Cluster analysis; microarray; phylogenetic relation; type 2 diabetes and nephropathy

Year: 2010 PMID： 20431808 PMCID： PMC2859286 DOI： 10.4103/0973-3930.60003

Source DB: PubMed Journal: Int J Diabetes Dev Ctries ISSN： 1998-3832

Background

Nephropathy (T2DN) is a frequent complication of diabetes mellitus. Renal failure in diabetes is mediated by multiple pathways. The risk factors for progression of chronic kidney disease (CKD) in type 2 diabetes Mellitus (DM) have not been fully elucidated. Although uncontrolled blood pressure (BP) is known to be deleterious, other factors may become more important once BP is treated. Asian Indians with type 2 diabetes mellitus (T2D) have higher susceptibility to diabetic nephropathy (T2DN), the leading cause of end stage renal disease and morbidity in diabetes. Peripheral blood cells play an important role in diabetes, yet very little is known about the molecular mechanisms of PBCs regulated in insulin homeostasis. In this study, the global gene expression changes in PBCs in diabetes and diabetic nephropathy to identify the potential candidate genes, expression and their phylogenetic relationship according to the different clusters in diabetes and nephropathy. We utilized the data of gene expression values from our earlier publication.[1]

Microarrays

High throughput techniques are becoming more and more important in many areas of basic and applied biomedical research. Microarray techniques using cDNAs are much high throughput approaches for large scale gene expression analysis and enable the investigation of mechanisms of fundamental processes and the molecular basis of disease on a genomic scale. Several clustering techniques have been used to analyze the microarray data. As gene chips become more routine in basic research, it is important for biologists to understand the biostatistical methods used to analyze these data so that they can better interpret the biological meaning of the results. Strategies for analyzing gene chip data can be broadly grouped into two categories: Discrimination and clustering. Discrimination requires that the data consist of two components. The first is the gene expression measurements from the chips run on a set of samples. The second component is data characterizing. For this method, the goal is to use a mathematical model to predict a sample characteristic, from the expression values. There are a large number of statistical and computational approaches for discrimination ranging from classical statistical linear discriminate analysis to modern machine learning approaches such as support vector machines and artificial neural networks. In clustering, the data consist only of the gene expression values. The analytical goal is to find clusters of samples or clusters of genes such that observations within a cluster are more similar to each other than they are to observations in different clusters. Cluster analysis can be viewed as a data reduction method in that the observations in a cluster can be represented by an ‘average’ of the observations in that cluster. There are a large number of statistical and computational approaches available for clustering. These include hierarchical clustering and k-means clustering from the statistical literature and self-organizing maps and artificial neural networks from the machine learning literature. While these algorithms are relatively equivalent in terms of performance, the focus of this paper will be on hierarchical clustering.[2]

Materials and Methods

Microarray data of gene expression values from Paturi V Rao's paper Gene expression profiles of peripheral blood cells in type 2 diabetes and nephropathy in Asian Indians is taken. Data analyzed here were collected on spotted DNA microarrays, The additional Data File1 contains 416 genes which are expressed in 3 different DNA microarray samples that is T2D vs. C, T2DN vs. C, T2DN vs. T2D. These 416 Genes with their expression values are given to Cluster 3.0 tool and a dendrogram is generated.

Hierarchical clustering

Several different algorithms will produce a hierarchical clustering from a pair-wise distance matrix. Cluster analysis is often used to bring similar individuals into groups. In hierarchical clustering, individuals are successively integrated based on the dissimilarity matrix computed by data, to obtain a dendrogram which contains inclusive clusters. In the context of microarray analysis, it is used to classify unknown genes or cases of disease. Several different algorithms will produce a hierarchical clustering from a pair-wise distance matrix. The algorithms begin with each gene by itself in a separate cluster. These clusters correspond to the tips of the clustering tree (dendrogram). The algorithms search the distance matrix for the pair of genes that have the smallest distance between them and merge these two genes into a cluster. Many algorithms follow this series of steps to produce hierarchical clustering of data. We will consider an average linkage algorithm. Average linkage is one of many hierarchical clustering algorithms that operate by iteratively merging the genes or gene clusters with the smallest distance between them followed by an updating of the distance matrix.

Heat maps

Hierarchical clustering is used to produce what have been called ‘heat maps’ in papers reporting on microarray data analyses. The heat map presents a grid of colored points where each color represents this case, the three columns represent samples and the rows represent 416 genes. In the heat map colors at a particular point (i.e., row by column coordinate) are assigned to represent the level of expression for that gene (row) in the sample (column) with red corresponding to high expression, green corresponding to low expression and black corresponding to an intermediate level of expression. The ordering of the rows and columns was determined using hierarchical clustering and the associated dendrogram for the samples shown. In this example, 3 samples were clustered. The heat map gives an overall view of the 416 genes expression levels.[3] When 416 genes were given to the Cluster 3.0 tool, genes got divided into four clusters. We have selected the fourth cluster as these genes are highly expressed with red color in T2DN. There are 116 listed genes and they are displayed in Additional data File2 with functions. Out of 116 Genes, only 50 are functional, this data is stored in additional Datafile3.

Results

Gene expression profiling of mRNA from PBCs from six diabetics with nephropathy (T2DN), six diabetics without nephropathy (T2D) and six non-diabetic subjects (C), using 13,824 human sequence verified cDNA clones revealed significant differential expression of 416 genes.[1] Hierarchical clustering of significant genes revealed distinct gene expression signatures for diabetes and diabetic nephropathy. A Phylogenetic relationship between the gene clusters is shown with distances and a cladogram is constructed [Figure 1].

Figure 1

Cluster Tree View Result – The List of genes expressed in type 2 diabetic nephropathy

Cluster Tree View Result – The List of genes expressed in type 2 diabetic nephropathy Now the 50 functional genes are taken along with their sequences and a phylogenetic tree is constructed using CLUSTAL W- A multiple sequence alignment tool Available on internet.[5] A phylogenetic Tree with the distances is displayed in the picture 2 and the distances can be seen from Additional Data file 4. Finally, each functional gene is taken and the OMIM database is searched for its role in humans. All the 50 genes are related closely as the distances between them is closer so their rate of expression is also similar. In the OMIM database the genes were keenly studied and identified some of the genes which are more responsible for diabetic nephropathy are identified; such as AGXT: Alanine-glyoxylate aminotransferase, RHOD: Ras homolog gene family, CAPN6: Calpain 6, EFNB2: Ephrin-B2, ANXA7: Annexin A7, PEG10: Paternally expressed 10, DPP4: Dipeptidyl-peptidase 4 (CD26, adenosine deaminase complexing protein 2), ENSA: Endosulfine alpha, IGFBP2: Insulin-like growth factor binding protein 2, 36kDa, CENPB: Centromere protein B, 80kDa, MLL3: Myeloid/lymphoid or mixed-lineage leukemia 3, BDNF: Brain-derived neurotrophic factor, EIF4A2: Eukaryotic translation initiation factor 4A, isoform 2, PPP2R1A: Protein phosphatase 2 (formerly 2A), regulatory subunit A, alpha isoform.

Discussion

We have focused on presenting an overview of hierarchical clustering of microarray data, emphasizing the relationship between a dendrogram and spatial representations of genes. We believe this relationship provides an intuitive understanding of how to analyze microarray data and can make it easier to interpret the results of a cluster analysis in a biological framework. The fact that the ‘heat maps’ found in most of the microarray publications are based on hierarchical clustering indicates that an understanding of this general method is valuable to those who are just beginning to read the microarray literature and even to those who are using supervised methods. We have used cluster analysis software, which is available online in Eisen laboratories and the version is Cluster3.0. Identification of candidate genes in peripheral blood could provide easily accessible biomarkers to monitor diabetic nephropathy and these are AGXT: Alanine-glyoxylate aminotransferase,[6] RHOD: Ras homolog gene family,[7] CAPN6: Calpain 6,[8] EFNB2: Ephrin-B2,[910] ANXA7: Annexin A7,[11] PEG10: Paternally expressed 10,[12] DPP4: Dipeptidyl-peptidase 4 (CD26, adenosine deaminase complexing protein 2),[13] ENSA: Endosulfine alpha,[14] IGFBP2: Insulin-like growth factor binding protein 2, 36kDa,[15] CENPB: Centromere protein B, 80kDa,[16] MLL3: Myeloid/lymphoid or mixed-lineage leukemia 3,[17] BDNF: Brain-derived neurotrophic factor,[18] EIF4A2: Eukaryotic translation initiation factor 4A, isoform 2,[19] PPP2R1A: Protein phosphatase 2 (formerly 2A), regulatory subunit A, alpha isoform.[20] Phylogenetic relationship shows that with similar expression values there is an evolutionary relation shown by the phylogenetic tree.[4] Clustal W- Phylogenetic tree of the 50 genes

17 in total

1. RhoD regulates endosome dynamics through Diaphanous-related Formin and Src tyrosine kinase.

Authors: Stéphane Gasman; Yannis Kalaidzidis; Marino Zerial
Journal: Nat Cell Biol Date: 2003-03 Impact factor: 28.824

2. Sequence analysis, expression and chromosomal localization of a gene, isolated from a subtracted human retina cDNA library, that encodes an insulin-like growth factor binding protein (IGFBP2).

Authors: N Agarwal; C L Hsieh; D Sills; M Swaroop; B Desai; U Francke; A Swaroop
Journal: Exp Eye Res Date: 1991-05 Impact factor: 3.467

3. Molecular cloning of a ligand for the EPH-related receptor protein-tyrosine kinase Htk.

Authors: B D Bennett; F C Zeigler; Q Gu; B Fendly; A D Goddard; N Gillett; W Matthews
Journal: Proc Natl Acad Sci U S A Date: 1995-03-14 Impact factor: 11.205

4. Primary hyperoxaluria type 1: a cluster of new mutations in exon 7 of the AGXT gene.

Authors: C von Schnakenburg; G Rumsby
Journal: J Med Genet Date: 1997-06 Impact factor: 6.318

5. A new subfamily of vertebrate calpains lacking a calmodulin-like domain: implications for calpain regulation and evolution.

Authors: N Dear; K Matena; M Vingron; T Boehm
Journal: Genomics Date: 1997-10-01 Impact factor: 5.736

6. Calcium channel activity of purified human synexin and structure of the human synexin gene.

Authors: A L Burns; K Magendzo; A Shirvan; M Srivastava; E Rojas; M R Alijani; H B Pollard
Journal: Proc Natl Acad Sci U S A Date: 1989-05 Impact factor: 11.205

7. Isolation and mapping of the human EIF4A2 gene homologous to the murine protein synthesis initiation factor 4A-II gene Eif4a2.

Authors: K Sudo; E Takahashi; Y Nakamura
Journal: Cytogenet Cell Genet Date: 1995

8. Genomic organization, exact localization, and tissue expression of the human CD26 (dipeptidyl peptidase IV) gene.

Authors: C A Abbott; E Baker; G R Sutherland; G W McCaughan
Journal: Immunogenetics Date: 1994 Impact factor: 2.846

9. Isolation of LERK-5: a ligand of the eph-related receptor tyrosine kinases.

Authors: D P Cerretti; T Vanden Bos; N Nelson; C J Kozlosky; P Reddy; E Maraskovsky; L S Park; S D Lyman; N G Copeland; D J Gilbert
Journal: Mol Immunol Date: 1995-11 Impact factor: 4.407

10. Human alpha-endosulfine, a possible regulator of sulfonylurea-sensitive KATP channel: molecular cloning, expression and biological properties.

Authors: L Heron; A Virsolvy; K Peyrollier; F M Gribble; A Le Cam; F M Ashcroft; D Bataille
Journal: Proc Natl Acad Sci U S A Date: 1998-07-07 Impact factor: 11.205

9 in total

1. Diabetes mellitus may induce cardiovascular disease by decreasing neuroplasticity.

Authors: Zhihua Zheng; Junyan Wu; Ruolun Wang; Yingtong Zeng
Journal: Funct Neurol Date: 2014 Jan-Mar

2. Epigenetic profiles of pre-diabetes transitioning to type 2 diabetes and nephropathy.

Authors: Thomas A VanderJagt; Monica H Neugebauer; Marilee Morgan; Donald W Bowden; Vallabh O Shah
Journal: World J Diabetes Date: 2015-08-10

3. Interrelationship of βeta-2 microglobulin, blood urea nitrogen and creatinine in streptozotocin-induced diabetes mellitus in rabbits.

Authors: Shahram Javadi; Siamak Asri-Rezaei; Maryam Allahverdizadeh
Journal: Vet Res Forum Date: 2014 Impact factor: 1.054

4. Bioinformatics analysis of microRNAs related to blood stasis syndrome in diabetes mellitus patients.

Authors: Ruixue Chen; Minghao Chen; Ya Xiao; Qiuer Liang; Yunfei Cai; Liguo Chen; Meixia Fang
Journal: Biosci Rep Date: 2018-03-21 Impact factor: 3.840

5. Calpain 6 inhibits autophagy in inflammatory environments: A preliminary study on myoblasts and a chronic kidney disease rat model.

Authors: Yue Yue Zhang; Li Jie Gu; Nan Zhu; Ling Wang; Min Chao Cai; Jie Shuang Jia; Shu Rong; Wei Jie Yuan
Journal: Int J Mol Med Date: 2021-08-26 Impact factor: 4.101

6. Gadd45α: a novel diabetes-associated gene potentially linking diabetic cardiomyopathy and baroreflex dysfunction.

Authors: Ning Wang; Chao Yang; Fang Xie; Lihua Sun; Xiaolin Su; Ying Wang; Ran Wei; Rong Zhang; Xia Li; Baofeng Yang; Jing Ai
Journal: PLoS One Date: 2012-12-05 Impact factor: 3.240

Review 7. KCNJ11: Genetic Polymorphisms and Risk of Diabetes Mellitus.

Authors: Polin Haghvirdizadeh; Zahurin Mohamed; Nor Azizan Abdullah; Pantea Haghvirdizadeh; Monir Sadat Haerian; Batoul Sadat Haerian
Journal: J Diabetes Res Date: 2015-09-13 Impact factor: 4.011

8. Assessment of glomerular filtration rate based on alterations of serum brain-derived neurotrophic factor in type 2 diabetic subjects treated with amlodipine/benazepril or valsartan/hydrochlorothiazide.

Authors: I-Te Lee; Wayne Huey-Herng Sheu; Yi-Jen Hung; Jung-Fu Chen; Chih-Yuan Wang; Wen-Jane Lee
Journal: Dis Markers Date: 2015-03-30 Impact factor: 3.434

Review 9. CAPN6 in disease: An emerging therapeutic target (Review).

Authors: Lin Chen; Dongqiong Xiao; Fajuan Tang; Hu Gao; Xihong Li
Journal: Int J Mol Med Date: 2020-09-21 Impact factor: 4.101

9 in total