Literature DB >> 16613834

Are clusters found in one dataset present in another dataset?

Amy V Kapp1, Robert Tibshirani.   

Abstract

In many microarray studies, a cluster defined on one dataset is sought in an independent dataset. If the cluster is found in the new dataset, the cluster is said to be "reproducible" and may be biologically significant. Classifying a new datum to a previously defined cluster can be seen as predicting which of the previously defined clusters is most similar to the new datum. If the new data classified to a cluster are similar, molecularly or clinically, to the data already present in the cluster, then the cluster is reproducible and the corresponding prediction accuracy is high. Here, we take advantage of the connection between reproducibility and prediction accuracy to develop a validation procedure for clusters found in datasets independent of the one in which they were characterized. We define a cluster quality measure called the "in-group proportion" (IGP) and introduce a general procedure for individually validating clusters. Using simulations and real breast cancer datasets, the IGP is compared to four other popular cluster quality measures (homogeneity score, separation score, silhouette width, and weighted average discrepant pairs score). Moreover, simulations and the real breast cancer datasets are used to compare the four versions of the validation procedure which all use the IGP, but differ in the way in which the null distributions are generated. We find that the IGP is the best measure of prediction accuracy, and one version of the validation procedure is the more widely applicable than the other three. An implementation of this algorithm is in a package called "clusterRepro" available through The Comprehensive R Archive Network (http://cran.r-project.org).

Entities:  

Mesh:

Year:  2006        PMID: 16613834     DOI: 10.1093/biostatistics/kxj029

Source DB:  PubMed          Journal:  Biostatistics        ISSN: 1465-4644            Impact factor:   5.899


  66 in total

1.  Counting clusters using R-NN curves.

Authors:  Rajarshi Guha; Debojyoti Dutta; David J Wild; Ting Chen
Journal:  J Chem Inf Model       Date:  2007-06-30       Impact factor: 4.956

2.  The Immune Subtypes and Landscape of Squamous Cell Carcinoma.

Authors:  Bailiang Li; Yi Cui; Dhanya K Nambiar; John B Sunwoo; Ruijiang Li
Journal:  Clin Cancer Res       Date:  2019-03-04       Impact factor: 12.531

3.  Unsupervised Clustering of Quantitative Image Phenotypes Reveals Breast Cancer Subtypes with Distinct Prognoses and Molecular Pathways.

Authors:  Jia Wu; Yi Cui; Xiaoli Sun; Guohong Cao; Bailiang Li; Debra M Ikeda; Allison W Kurian; Ruijiang Li
Journal:  Clin Cancer Res       Date:  2017-01-10       Impact factor: 12.531

4.  Deep-learning approach to identifying cancer subtypes using high-dimensional genomic data.

Authors:  Runpu Chen; Le Yang; Steve Goodison; Yijun Sun
Journal:  Bioinformatics       Date:  2020-03-01       Impact factor: 6.937

5.  SPARSE INTEGRATIVE CLUSTERING OF MULTIPLE OMICS DATA SETS.

Authors:  Ronglai Shen; Sijian Wang; Qianxing Mo
Journal:  Ann Appl Stat       Date:  2013-04-09       Impact factor: 2.083

6.  Sample-specific perturbation of gene interactions identifies breast cancer subtypes.

Authors:  Yuanyuan Chen; Yu Gu; Zixi Hu; Xiao Sun
Journal:  Brief Bioinform       Date:  2021-07-20       Impact factor: 11.622

Review 7.  Breast cancer classification and prognostication through diverse systems along with recent emerging findings in this respect; the dawn of new perspectives in the clinical applications.

Authors:  Vida Pourteimoor; Samira Mohammadi-Yeganeh; Mahdi Paryan
Journal:  Tumour Biol       Date:  2016-09-20

8.  Magnetic resonance image features identify glioblastoma phenotypic subtypes with distinct molecular pathway activities.

Authors:  Haruka Itakura; Achal S Achrol; Lex A Mitchell; Joshua J Loya; Tiffany Liu; Erick M Westbroek; Abdullah H Feroze; Scott Rodriguez; Sebastian Echegaray; Tej D Azad; Kristen W Yeom; Sandy Napel; Daniel L Rubin; Steven D Chang; Griffith R Harsh; Olivier Gevaert
Journal:  Sci Transl Med       Date:  2015-09-02       Impact factor: 17.956

9.  Discovery of molecular subtypes in leiomyosarcoma through integrative molecular profiling.

Authors:  A H Beck; C-H Lee; D M Witten; B C Gleason; B Edris; I Espinosa; S Zhu; R Li; K D Montgomery; R J Marinelli; R Tibshirani; T Hastie; D M Jablons; B P Rubin; C D Fletcher; R B West; M van de Rijn
Journal:  Oncogene       Date:  2009-11-09       Impact factor: 9.867

Review 10.  Prediction of breast cancer metastasis by genomic profiling: where do we stand?

Authors:  Ulrich Pfeffer; Francesco Romeo; Douglas M Noonan; Adriana Albini
Journal:  Clin Exp Metastasis       Date:  2009-03-24       Impact factor: 5.150

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.