Literature DB >> 11911797

Inference from clustering with application to gene-expression microarrays.

Edward R Dougherty1, Junior Barrera, Marcel Brun, Seungchan Kim, Roberto M Cesar, Yidong Chen, Michael Bittner, Jeffrey M Trent.   

Abstract

There are many algorithms to cluster sample data points based on nearness or a similarity measure. Often the implication is that points in different clusters come from different underlying classes, whereas those in the same cluster come from the same class. Stochastically, the underlying classes represent different random processes. The inference is that clusters represent a partition of the sample points according to which process they belong. This paper discusses a model-based clustering toolbox that evaluates cluster accuracy. Each random process is modeled as its mean plus independent noise, sample points are generated, the points are clustered, and the clustering error is the number of points clustered incorrectly according to the generating random processes. Various clustering algorithms are evaluated based on process variance and the key issue of the rate at which algorithmic performance improves with increasing numbers of experimental replications. The model means can be selected by hand to test the separability of expected types of biological expression patterns. Alternatively, the model can be seeded by real data to test the expected precision of that output or the extent of improvement in precision that replication could provide. In the latter case, a clustering algorithm is used to form clusters, and the model is seeded with the means and variances of these clusters. Other algorithms are then tested relative to the seeding algorithm. Results are averaged over various seeds. Output includes error tables and graphs, confusion matrices, principal-component plots, and validation measures. Five algorithms are studied in detail: K-means, fuzzy C-means, self-organizing maps, hierarchical Euclidean-distance-based and correlation-based clustering. The toolbox is applied to gene-expression clustering based on cDNA microarrays using real data. Expression profile graphics are generated and error analysis is displayed within the context of these profile graphics. A large amount of generated output is available over the web.

Entities:  

Mesh:

Substances:

Year:  2002        PMID: 11911797     DOI: 10.1089/10665270252833217

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  25 in total

1.  Global analysis of gene expression patterns during disuse atrophy in rat skeletal muscle.

Authors:  Eric J Stevenson; Paul G Giresi; Alan Koncarevic; Susan C Kandarian
Journal:  J Physiol       Date:  2003-07-04       Impact factor: 5.182

2.  In silico development, validation and comparison of predictive QSAR models for lipid peroxidation inhibitory activity of cinnamic acid and caffeic acid derivatives using multiple chemometric and cheminformatics tools.

Authors:  Indrani Mitra; Achintya Saha; Kunal Roy
Journal:  J Mol Model       Date:  2012-03-21       Impact factor: 1.810

3.  CADLIVE dynamic simulator: direct link of biochemical networks to dynamic models.

Authors:  Hiroyuki Kurata; Kouichi Masaki; Yoshiyuki Sumida; Rei Iwasaki
Journal:  Genome Res       Date:  2005-04       Impact factor: 9.043

4.  Validation of computational methods in genomics.

Authors:  Edward R Doughtery; Hua Jianping; Michael L Bittner
Journal:  Curr Genomics       Date:  2007-03       Impact factor: 2.236

5.  Modeling bioconcentration factor (BCF) using mechanistically interpretable descriptors computed from open source tool "PaDEL-Descriptor".

Authors:  Subrata Pramanik; Kunal Roy
Journal:  Environ Sci Pollut Res Int       Date:  2013-10-30       Impact factor: 4.223

6.  Pharmacophore mapping of arylamino-substituted benzo[b]thiophenes as free radical scavengers.

Authors:  Indrani Mitra; Achintya Saha; Kunal Roy
Journal:  J Mol Model       Date:  2010-03-01       Impact factor: 1.810

7.  PCA based population generation for genetic network optimization.

Authors:  Ahammed Sherief Kizhakkethil Youseph; Madhu Chetty; Gour Karmakar
Journal:  Cogn Neurodyn       Date:  2018-04-30       Impact factor: 5.082

8.  Clustering algorithms: on learning, validation, performance, and applications to genomics.

Authors:  Lori Dalton; Virginia Ballarin; Marcel Brun
Journal:  Curr Genomics       Date:  2009-09       Impact factor: 2.236

9.  Gene expression profiling-based identification of cell-surface targets for developing multimeric ligands in pancreatic cancer.

Authors:  Yoganand Balagurunathan; David L Morse; Galen Hostetter; Vijayalakshmi Shanmugam; Phillip Stafford; Sonsoles Shack; John Pearson; Maria Trissal; Michael J Demeure; Daniel D Von Hoff; Victor J Hruby; Robert J Gillies; Haiyong Han
Journal:  Mol Cancer Ther       Date:  2008-09-02       Impact factor: 6.261

10.  Docking and 3D-QSAR studies of acetohydroxy acid synthase inhibitor sulfonylurea derivatives.

Authors:  Kunal Roy; Somnath Paul
Journal:  J Mol Model       Date:  2009-10-20       Impact factor: 1.810

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.