Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 A fast hierarchical clustering algorithm for large-scale protein sequence data sets.

Literature DB >> 24657908

A fast hierarchical clustering algorithm for large-scale protein sequence data sets.

Abstract

TRIBE-MCL is a Markov clustering algorithm that operates on a graph built from pairwise similarity information of the input data. Edge weights stored in the stochastic similarity matrix are alternately fed to the two main operations, inflation and expansion, and are normalized in each main loop to maintain the probabilistic constraint. In this paper we propose an efficient implementation of the TRIBE-MCL clustering algorithm, suitable for fast and accurate grouping of protein sequences. A modified sparse matrix structure is introduced that can efficiently handle most operations of the main loop. Taking advantage of the symmetry of the similarity matrix, a fast matrix squaring formula is also introduced to facilitate the time consuming expansion. The proposed algorithm was tested on protein sequence databases like SCOP95. In terms of efficiency, the proposed solution improves execution speed by two orders of magnitude, compared to recently published efficient solutions, reducing the total runtime well below 1min in the case of the 11,944proteins of SCOP95. This improvement in computation time is reached without losing anything from the partition quality. Convergence is generally reached in approximately 50 iterations. The efficient execution enabled us to perform a thorough evaluation of classification results and to formulate recommendations regarding the choice of the algorithm׳s parameter values.

Keywords: Efficient computing; Markov clustering; Markov processes; Protein sequence clustering; Sparse matrix

Mesh：

Substances：
Proteins

Year: 2014 PMID： 24657908 DOI： 10.1016/j.compbiomed.2014.02.016

Source DB: PubMed Journal: Comput Biol Med ISSN： 0010-4825 Impact factor: 4.589

Keyword Cloud
Cited

3 in total

1. Evaluation of BLAST-based edge-weighting metrics used for homology inference with the Markov Clustering algorithm.

Authors: Theodore R Gibbons; Stephen M Mount; Endymion D Cooper; Charles F Delwiche
Journal: BMC Bioinformatics Date: 2015-07-10 Impact factor: 3.169

2. Improved multi-objective clustering algorithm using particle swarm optimization.

Authors: Congcong Gong; Haisong Chen; Weixiong He; Zhanliang Zhang
Journal: PLoS One Date: 2017-12-05 Impact factor: 3.240

3. Genome-Enhanced Detection and Identification (GEDI) of plant pathogens.

Authors: Nicolas Feau; Stéphanie Beauseigle; Marie-Josée Bergeron; Guillaume J Bilodeau; Inanc Birol; Sandra Cervantes-Arango; Braham Dhillon; Angela L Dale; Padmini Herath; Steven J M Jones; Josyanne Lamarche; Dario I Ojeda; Monique L Sakalidis; Greg Taylor; Clement K M Tsui; Adnan Uzunovic; Hesther Yueh; Philippe Tanguay; Richard C Hamelin
Journal: PeerJ Date: 2018-02-22 Impact factor: 2.984

3 in total