Literature DB >> 28114078

Cluster Validation Method for Determining the Number of Clusters in Categorical Sequences.

Gongde Guo, Lifei Chen, Yanfang Ye, Qingshan Jiang.   

Abstract

Cluster validation, which is the process of evaluating the quality of clustering results, plays an important role for practical machine learning systems. Categorical sequences, such as biological sequences in computational biology, have become common in real-world applications. Different from previous studies, which mainly focused on attribute-value data, in this paper, we work on the cluster validation problem for categorical sequences. The evaluation of sequences clustering is currently difficult due to the lack of an internal validation criterion defined with regard to the structural features hidden in sequences. To solve this problem, in this paper, a novel cluster validity index (CVI) is proposed as a function of clustering, with the intracluster structural compactness and intercluster structural separation linearly combined to measure the quality of sequence clusters. A partition-based algorithm for robust clustering of categorical sequences is also proposed, which provides the new measure with high-quality clustering results by the deterministic initialization and the elimination of noise clusters using an information theoretic method. The new clustering algorithm and the CVI are then assembled within the common model selection procedure to determine the number of clusters in categorical sequence sets. A case study on commonly used protein sequences and the experimental results on some real-world sequence sets from different domains are given to demonstrate the performance of the proposed method.

Year:  2016        PMID: 28114078     DOI: 10.1109/TNNLS.2016.2608354

Source DB:  PubMed          Journal:  IEEE Trans Neural Netw Learn Syst        ISSN: 2162-237X            Impact factor:   10.451


  3 in total

1.  Automatic Annotation of Unlabeled Data from Smartphone-Based Motion and Location Sensors.

Authors:  Nsikak Pius Owoh; Manmeet Mahinderjit Singh; Zarul Fitri Zaaba
Journal:  Sensors (Basel)       Date:  2018-07-03       Impact factor: 3.576

2.  Farm management practices, biosecurity and influenza a virus detection in swine farms: a comprehensive study in colombia.

Authors:  Ciuoderis-Aponte Karl; Diaz Andres; Muskus Carlos; Mario Peña; Hernández-Ortiz Juan; Osorio Jorge
Journal:  Porcine Health Manag       Date:  2022-10-05

3.  Clustering by phenotype and genome-wide association study in autism.

Authors:  Akira Narita; Masato Nagai; Satoshi Mizuno; Soichi Ogishima; Gen Tamiya; Masao Ueki; Rieko Sakurai; Satoshi Makino; Taku Obara; Mami Ishikuro; Chizuru Yamanaka; Hiroko Matsubara; Yasutaka Kuniyoshi; Keiko Murakami; Fumihiko Ueno; Aoi Noda; Tomoko Kobayashi; Mika Kobayashi; Takuma Usuzaki; Hisashi Ohseto; Atsushi Hozawa; Masahiro Kikuya; Hirohito Metoki; Shigeo Kure; Shinichi Kuriyama
Journal:  Transl Psychiatry       Date:  2020-08-17       Impact factor: 6.222

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.