Literature DB >> 17597097

Penalized and weighted K-means for clustering with scattered objects and prior information in high-throughput biological data.

George C Tseng1.   

Abstract

MOTIVATION: Cluster analysis is one of the most important data mining tools for investigating high-throughput biological data. The existence of many scattered objects that should not be clustered has been found to hinder performance of most traditional clustering algorithms in such a high-dimensional complex situation. Very often, additional prior knowledge from databases or previous experiments is also available in the analysis. Excluding scattered objects and incorporating existing prior information are desirable to enhance the clustering performance.
RESULTS: In this article, a class of loss functions is proposed for cluster analysis and applied in high-throughput genomic and proteomic data. Two major extensions from K-means are involved: penalization and weighting. The additive penalty term is used to allow a set of scattered objects without being clustered. Weights are introduced to account for prior information of preferred or prohibited cluster patterns to be identified. Their relationship with the classification likelihood of Gaussian mixture models is explored. Incorporation of good prior information is also shown to improve the global optimization issue in clustering. Applications of the proposed method on simulated data as well as high-throughput data sets from tandem mass spectrometry (MS/MS) and microarray experiments are presented. Our results demonstrate its superior performance over most existing methods and its computational simplicity and extensibility in the application of large complex biological data sets. AVAILABILITY: http://www.pitt.edu/~ctseng/research/software.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities:  

Mesh:

Year:  2007        PMID: 17597097     DOI: 10.1093/bioinformatics/btm320

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  29 in total

1.  Module-based prediction approach for robust inter-study predictions in microarray data.

Authors:  Zhibao Mi; Kui Shen; Nan Song; Chunrong Cheng; Chi Song; Naftali Kaminski; George C Tseng
Journal:  Bioinformatics       Date:  2010-08-17       Impact factor: 6.937

2.  A data-mining scheme for identifying peptide structural motifs responsible for different MS/MS fragmentation intensity patterns.

Authors:  Yingying Huang; George C Tseng; Shinsheng Yuan; Ljiljana Pasa-Tolic; Mary S Lipton; Richard D Smith; Vicki H Wysocki
Journal:  J Proteome Res       Date:  2007-12-04       Impact factor: 4.466

3.  Dynamic and complex transcription factor binding during an inducible response in yeast.

Authors:  Li Ni; Can Bruce; Christopher Hart; Justine Leigh-Bell; Daniel Gelperin; Lara Umansky; Mark B Gerstein; Michael Snyder
Journal:  Genes Dev       Date:  2009-06-01       Impact factor: 11.361

4.  Network-based multiple locus linkage analysis of expression traits.

Authors:  Wei Pan
Journal:  Bioinformatics       Date:  2009-03-31       Impact factor: 6.937

5.  Integrative Sparse K-Means With Overlapping Group Lasso in Genomic Applications for Disease Subtype Discovery.

Authors:  Zhiguang Huo; George Tseng
Journal:  Ann Appl Stat       Date:  2017-07-20       Impact factor: 2.083

Review 6.  A Survey of Data Mining and Deep Learning in Bioinformatics.

Authors:  Kun Lan; Dan-Tong Wang; Simon Fong; Lian-Sheng Liu; Kelvin K L Wong; Nilanjan Dey
Journal:  J Med Syst       Date:  2018-06-28       Impact factor: 4.460

7.  A Network Approach to Wound Healing.

Authors:  Tomasz Arodz; Danail Bonchev; Robert F Diegelmann
Journal:  Adv Wound Care (New Rochelle)       Date:  2013-11       Impact factor: 4.730

8.  Semi-supervised gene shaving method for predicting low variation biological pathways from genome-wide data.

Authors:  Dongxiao Zhu
Journal:  BMC Bioinformatics       Date:  2009-01-30       Impact factor: 3.169

9.  A comparison of four clustering methods for brain expression microarray data.

Authors:  Alexander L Richards; Peter Holmans; Michael C O'Donovan; Michael J Owen; Lesley Jones
Journal:  BMC Bioinformatics       Date:  2008-11-25       Impact factor: 3.169

10.  Dynamically weighted clustering with noise set.

Authors:  Yijing Shen; Wei Sun; Ker-Chau Li
Journal:  Bioinformatics       Date:  2009-12-09       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.