Literature DB >> 15130935

A dynamically growing self-organizing tree (DGSOT) for hierarchical clustering gene expression profiles.

Feng Luo1, Latifur Khan, Farokh Bastani, I-Ling Yen, Jizhong Zhou.   

Abstract

MOTIVATION: The increasing use of microarray technologies is generating large amounts of data that must be processed in order to extract useful and rational fundamental patterns of gene expression. Hierarchical clustering technology is one method used to analyze gene expression data, but traditional hierarchical clustering algorithms suffer from several drawbacks (e.g. fixed topology structure; mis-clustered data which cannot be reevaluated). In this paper, we introduce a new hierarchical clustering algorithm that overcomes some of these drawbacks. RESULT: We propose a new tree-structure self-organizing neural network, called dynamically growing self-organizing tree (DGSOT) algorithm for hierarchical clustering. The DGSOT constructs a hierarchy from top to bottom by division. At each hierarchical level, the DGSOT optimizes the number of clusters, from which the proper hierarchical structure of the underlying dataset can be found. In addition, we propose a new cluster validation criterion based on the geometric property of the Voronoi partition of the dataset in order to find the proper number of clusters at each hierarchical level. This criterion uses the Minimum Spanning Tree (MST) concept of graph theory and is computationally inexpensive for large datasets. A K-level up distribution (KLD) mechanism, which increases the scope of data distribution in the hierarchy construction, was used to improve the clustering accuracy. The KLD mechanism allows the data misclustered in the early stages to be reevaluated at a later stage and increases the accuracy of the final clustering result. The clustering result of the DGSOT is easily displayed as a dendrogram for visualization. Based on a yeast cell cycle microarray expression dataset, we found that our algorithm extracts gene expression patterns at different levels. Furthermore, the biological functionality enrichment in the clusters is considerably high and the hierarchical structure of the clusters is more reasonable. AVAILABILITY: DGSOT is available upon request from the authors.

Entities:  

Mesh:

Substances:

Year:  2004        PMID: 15130935     DOI: 10.1093/bioinformatics/bth292

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  6 in total

1.  Transcriptome analysis of microRNAs in developing cerebral cortex of rat.

Authors:  Mao-Jin Yao; Gang Chen; Ping-Ping Zhao; Ming-Hua Lu; Jiang Jian; Mo-Fang Liu; Xiao-Bing Yuan
Journal:  BMC Genomics       Date:  2012-06-12       Impact factor: 3.969

2.  A novel approach for data integration and disease subtyping.

Authors:  Tin Nguyen; Rebecca Tagett; Diana Diaz; Sorin Draghici
Journal:  Genome Res       Date:  2017-10-24       Impact factor: 9.043

3.  A Novel Method for Cancer Subtyping and Risk Prediction Using Consensus Factor Analysis.

Authors:  Duc Tran; Hung Nguyen; Uyen Le; George Bebis; Hung N Luu; Tin Nguyen
Journal:  Front Oncol       Date:  2020-06-24       Impact factor: 6.244

4.  A Robust Manifold Graph Regularized Nonnegative Matrix Factorization Algorithm for Cancer Gene Clustering.

Authors:  Rong Zhu; Jin-Xing Liu; Yuan-Ke Zhang; Ying Guo
Journal:  Molecules       Date:  2017-12-02       Impact factor: 4.411

5.  Molecular Subtyping and Outlier Detection in Human Disease Using the Paraclique Algorithm.

Authors:  Ronald D Hagan; Michael A Langston
Journal:  Algorithms       Date:  2021-02-19

6.  Integrated Cancer Subtyping using Heterogeneous Genome-Scale Molecular Datasets.

Authors:  Suzan Arslanturk; Sorin Draghici; Tin Nguyen
Journal:  Pac Symp Biocomput       Date:  2020
  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.