| Literature DB >> 25912991 |
Harun Pirim1, Burak Ekşioğlu2, Andy D Perkins3.
Abstract
To address important challenges in bioinformatics, high throughput data technologies are needed to interpret biological data efficiently and reliably. Clustering is widely used as a first step to interpreting high dimensional biological data, such as the gene expression data measured by microarrays. A good clustering algorithm should be efficient, reliable, and effective, as demonstrated by its capability of determining biologically relevant clusters. This paper proposes a new minimum spanning tree based heuristic B-MST, that is guided by an innovative objective function: the tightness and separation index (TSI). The TSI presented here obtains biologically meaningful clusters, making use of co-expression network topology, and this paper develops a local search procedure to minimize the TSI value. The proposed B-MST is tested by comparing results to: (1) adjusted rand index (ARI), for microarray data sets with known object classes, and (2) gene ontology (GO) annotations for data sets without documented object classes.Entities:
Keywords: Biological networks; Clustering; Gene expression data; Graph mining; Heuristics
Mesh:
Year: 2015 PMID: 25912991 DOI: 10.1016/j.compbiomed.2015.03.031
Source DB: PubMed Journal: Comput Biol Med ISSN: 0010-4825 Impact factor: 4.589