Literature DB >> 31180868

Object Weighting: A New Clustering Approach to Deal with Outliers and Cluster Overlap in Computational Biology.

Alexandre Gondeau, Zahia Aouabed, Mohamed Hijri, Pedro Peres-Neto, Vladimir Makarenkov.   

Abstract

Considerable efforts have been made over the last decades to improve the robustness of clustering algorithms against noise features and outliers, known to be important sources of error in clustering. Outliers dominate the sum-of-the-squares calculations and generate cluster overlap, thus leading to unreliable clustering results. They can be particularly detrimental in computational biology, e.g., when determining the number of clusters in gene expression data related to cancer or when inferring phylogenetic trees and networks. While the issue of feature weighting has been studied in detail, no clustering methods using object weighting have been proposed yet. Here we describe a new general data partitioning method that includes an object-weighting step to assign higher weights to outliers and objects that cause cluster overlap. Different object weighting schemes, based on the Silhouette cluster validity index, the median and two intercluster distances, are defined. We compare our novel technique to a number of popular and efficient clustering algorithms, such as K-means, X-means, DAPC and Prediction Strength. In the presence of outliers and cluster overlap, our method largely outperforms X-means, DAPC and Prediction Strength as well as the K-means algorithm based on feature weighting.

Entities:  

Mesh:

Year:  2021        PMID: 31180868      PMCID: PMC8158064          DOI: 10.1109/TCBB.2019.2921577

Source DB:  PubMed          Journal:  IEEE/ACM Trans Comput Biol Bioinform        ISSN: 1545-5963            Impact factor:   3.710


  19 in total

1.  Integrating spatial fuzzy clustering with level set methods for automated medical image segmentation.

Authors:  Bing Nan Li; Chee Kong Chui; Stephen Chang; S H Ong
Journal:  Comput Biol Med       Date:  2010-11-12       Impact factor: 4.589

2.  Pattern recognition methods to relate time profiles of gene expression with phenotypic data: a comparative study.

Authors:  Diana M Hendrickx; Danyel G J Jennen; Jacob J Briedé; Rachel Cavill; Theo M de Kok; Jos C S Kleinjans
Journal:  Bioinformatics       Date:  2015-02-19       Impact factor: 6.937

3.  SAFE-clustering: Single-cell Aggregated (from Ensemble) clustering for single-cell RNA-seq data.

Authors:  Yuchen Yang; Ruth Huh; Houston W Culpepper; Yuan Lin; Michael I Love; Yun Li
Journal:  Bioinformatics       Date:  2019-04-15       Impact factor: 6.937

4.  COPA--cancer outlier profile analysis.

Authors:  James W MacDonald; Debashis Ghosh
Journal:  Bioinformatics       Date:  2006-08-07       Impact factor: 6.937

5.  SPICi: a fast clustering algorithm for large biological networks.

Authors:  Peng Jiang; Mona Singh
Journal:  Bioinformatics       Date:  2010-02-24       Impact factor: 6.937

6.  Discriminant analysis of principal components: a new method for the analysis of genetically structured populations.

Authors:  Thibaut Jombart; Sébastien Devillard; François Balloux
Journal:  BMC Genet       Date:  2010-10-15       Impact factor: 2.797

7.  Measures of central tendency: Median and mode.

Authors:  S Manikandan
Journal:  J Pharmacol Pharmacother       Date:  2011-07

8.  Dynamically weighted clustering with noise set.

Authors:  Yijing Shen; Wei Sun; Ker-Chau Li
Journal:  Bioinformatics       Date:  2009-12-09       Impact factor: 6.937

9.  Improved prediction of breast cancer outcome by identifying heterogeneous biomarkers.

Authors:  Jonghwan Choi; Sanghyun Park; Youngmi Yoon; Jaegyoon Ahn
Journal:  Bioinformatics       Date:  2017-11-15       Impact factor: 6.937

10.  Clustering cancer gene expression data: a comparative study.

Authors:  Marcilio C P de Souto; Ivan G Costa; Daniel S A de Araujo; Teresa B Ludermir; Alexander Schliep
Journal:  BMC Bioinformatics       Date:  2008-11-27       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.