| Literature DB >> 30102565 |
Caio Santiago1, Vivian Pereira2, Luciano Digiampietri2.
Abstract
Homologous sequences are widely used to understand the functions of certain genes or proteins. However, there is no consensus to solve the automatic assignment of functions to protein problem and many algorithms have different ways of identifying homologous clusters in a given set of sequences. In this article, we present an algorithm to deal with specific sets, the set of coding sequences obtained from phylogenetically close genomes (of the same species, genus, or family). When modeled as a graph, these sets have their own characteristics: they form more homogeneous and denser clusters. To solve this problem, our algorithm makes use of the clustering coefficient, which maximization can lead to the expected results from the biological point of view. In addition, we also present an algorithm for the identification of sequence domains based on graph topology. We also compared our results with those of the TribeMCL tool, a well-established algorithm of the area.Keywords: clustering coefficient; domain detection; graph modeling; homology detection; local alignment; sequence clustering
Year: 2018 PMID: 30102565 DOI: 10.1089/cmb.2017.0266
Source DB: PubMed Journal: J Comput Biol ISSN: 1066-5277 Impact factor: 1.479