Literature DB >> 18050154

INCA: new statistic for estimating the number of clusters and identifying atypical units.

I Irigoien1, C Arenas.   

Abstract

This paper presents a solution to two problems that arise in the classification of data such as types of tumor, samples of gene expression profiles or general biomedical data. First, to estimate the real number of clusters in a data set and second to decide whether a new unit belongs to one of these previously identified clusters or it is an outlier or atypical unit. We propose a new statistic which allows us to solve these problems. As our approach is based on a measure of distance or dissimilarity between any pair of units, it can be applied to any kind of multivariate data (continuous, binary or multi-attribute data) and it has applications in many biomedical fields. We validated the approach in simulated examples and applied it to the diagnosis of dermal diseases and to the analysis of lymphatic cancer data, showing the good performance of our approach. (c) 2007 John Wiley & Sons, Ltd.

Entities:  

Mesh:

Year:  2008        PMID: 18050154     DOI: 10.1002/sim.3143

Source DB:  PubMed          Journal:  Stat Med        ISSN: 0277-6715            Impact factor:   2.373


  3 in total

1.  Multivariate hypergeometric similarity measure.

Authors:  Chanchala D Kaddi; R Mitchell Parry; May D Wang
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2013 Nov-Dec       Impact factor: 3.710

2.  ICGE: an R package for detecting relevant clusters and atypical units in gene expression.

Authors:  Itziar Irigoien; Basilio Sierra; Concepcion Arenas
Journal:  BMC Bioinformatics       Date:  2012-02-13       Impact factor: 3.169

3.  Towards application of one-class classification methods to medical data.

Authors:  Itziar Irigoien; Basilio Sierra; Concepción Arenas
Journal:  ScientificWorldJournal       Date:  2014-03-20
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.