Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Accelerating high-dimensional clustering with lossless data reduction.

Literature DB >> 28520900

Accelerating high-dimensional clustering with lossless data reduction.

Bahjat F Qaqish¹, Jonathon J O'Brien², Jonathan C Hibbard¹, Katie J Clowers².

Abstract

MOTIVATION: For cluster analysis, high-dimensional data are associated with instability, decreased classification accuracy and high-computational burden. The latter challenge can be eliminated as a serious concern. For applications where dimension reduction techniques are not implemented, we propose a temporary transformation which accelerates computations with no loss of information. The algorithm can be applied for any statistical procedure depending only on Euclidean distances and can be implemented sequentially to enable analyses of data that would otherwise exceed memory limitations.
RESULTS: The method is easily implemented in common statistical software as a standard pre-processing step. The benefit of our algorithm grows with the dimensionality of the problem and the complexity of the analysis. Consequently, our simple algorithm not only decreases the computation time for routine analyses, it opens the door to performing calculations that may have otherwise been too burdensome to attempt.
AVAILABILITY AND IMPLEMENTATION: R, Matlab and SAS/IML code for implementing lossless data reduction is freely available in the Appendix. CONTACT: obrienj@hms.harvard.edu.

Entities: Chemical

Mesh：

Substances：
Fungal Proteins

Year: 2017 PMID： 28520900 PMCID： PMC5870568 DOI： 10.1093/bioinformatics/btx328

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

Keyword Cloud
References

10 in total

1. Principal component analysis for clustering gene expression data.

Authors: K Y Yeung; W L Ruzzo
Journal: Bioinformatics Date: 2001-09 Impact factor: 6.937

2. Quantitative mass spectrometry-based multiplexing compares the abundance of 5000 S. cerevisiae proteins across 10 carbon sources.

Authors: Joao A Paulo; Jeremy D O'Connell; Robert A Everley; Jonathon O'Brien; Micah A Gygi; Steven P Gygi
Journal: J Proteomics Date: 2016-07-16 Impact factor: 4.044

3. Tight clustering: a resampling-based approach for identifying stable and tight patterns in data.

Authors: George C Tseng; Wing H Wong
Journal: Biometrics Date: 2005-03 Impact factor: 2.571

4. Evaluation and comparison of gene clustering methods in microarray analysis.

Authors: Anbupalam Thalamuthu; Indranil Mukhopadhyay; Xiaojing Zheng; George C Tseng
Journal: Bioinformatics Date: 2006-07-31 Impact factor: 6.937

5. What is principal component analysis?

Authors: Markus Ringnér
Journal: Nat Biotechnol Date: 2008-03 Impact factor: 54.908

6. ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking.

Authors: Matthew D Wilkerson; D Neil Hayes
Journal: Bioinformatics Date: 2010-04-28 Impact factor: 6.937

7. Quantitative temporal viromics: an approach to investigate host-pathogen interaction.

Authors: Michael P Weekes; Peter Tomasec; Edward L Huttlin; Ceri A Fielding; David Nusinow; Richard J Stanton; Eddie C Y Wang; Rebecca Aicheler; Isa Murrell; Gavin W G Wilkinson; Paul J Lehner; Steven P Gygi
Journal: Cell Date: 2014-06-05 Impact factor: 41.582