| Literature DB >> 18050154 |
Abstract
This paper presents a solution to two problems that arise in the classification of data such as types of tumor, samples of gene expression profiles or general biomedical data. First, to estimate the real number of clusters in a data set and second to decide whether a new unit belongs to one of these previously identified clusters or it is an outlier or atypical unit. We propose a new statistic which allows us to solve these problems. As our approach is based on a measure of distance or dissimilarity between any pair of units, it can be applied to any kind of multivariate data (continuous, binary or multi-attribute data) and it has applications in many biomedical fields. We validated the approach in simulated examples and applied it to the diagnosis of dermal diseases and to the analysis of lymphatic cancer data, showing the good performance of our approach. (c) 2007 John Wiley & Sons, Ltd.Entities:
Mesh:
Year: 2008 PMID: 18050154 DOI: 10.1002/sim.3143
Source DB: PubMed Journal: Stat Med ISSN: 0277-6715 Impact factor: 2.373