Literature DB >> 15261151

Cluster-C, an algorithm for the large-scale clustering of protein sequences based on the extraction of maximal cliques.

S Mohseni-Zadeh1, P Brézellec, J -L Risler.   

Abstract

Although the characterization of proteins cannot solely rely upon sequence similarity, it has been widely proved that all-vs-all massive sequence comparisons may be an effective approach and a good basis for the prediction of biochemical functions or for the delineation of common shared properties. The program Cluster-C presented here enables a stand-alone and efficient construction of protein families within whole proteomes. The algorithm, which is based on the detection of cliques, ensures a high level of connectivity within the clusters. As opposed to the single transitive linkage method, Cluster-C allows a large number of sequences to be classified in such a way that the multidomain proteins do not produce a chain-grouping effect resulting in meaningless clusters. Moreover, some proteins can be present in several different but relevant clusters, which is of help in the determination of their functional domains. In the present analysis we used the Z-value, an evaluation of the significance of the similarity score, as the criterion for connecting sequences (the user can freely define the threshold of the similarity criterion). The clusters built with a rather low threshold (Z= 14) include more than 97% of the sequences and are consistent with known protein families and PROSITE patterns.

Mesh:

Substances:

Year:  2004        PMID: 15261151     DOI: 10.1016/j.compbiolchem.2004.03.002

Source DB:  PubMed          Journal:  Comput Biol Chem        ISSN: 1476-9271            Impact factor:   2.877


  6 in total

1.  Different binding properties and function of CXXC zinc finger domains in Dnmt1 and Tet1.

Authors:  Carina Frauer; Andrea Rottach; Daniela Meilinger; Sebastian Bultmann; Karin Fellinger; Stefan Hasenöder; Mengxi Wang; Weihua Qin; Johannes Söding; Fabio Spada; Heinrich Leonhardt
Journal:  PLoS One       Date:  2011-02-02       Impact factor: 3.240

2.  Integrating overlapping structures and background information of words significantly improves biological sequence comparison.

Authors:  Qi Dai; Lihua Li; Xiaoqing Liu; Yuhua Yao; Fukun Zhao; Michael Zhang
Journal:  PLoS One       Date:  2011-11-10       Impact factor: 3.240

3.  SEQOPTICS: a protein sequence clustering system.

Authors:  Yonghui Chen; Kevin D Reilly; Alan P Sprague; Zhijie Guan
Journal:  BMC Bioinformatics       Date:  2006-12-12       Impact factor: 3.169

4.  Both simulation and sequencing data reveal coinfections with multiple SARS-CoV-2 variants in the COVID-19 pandemic.

Authors:  Yinhu Li; Yiqi Jiang; Zhengtu Li; Yonghan Yu; Jiaxing Chen; Wenlong Jia; Yen Kaow Ng; Feng Ye; Shuai Cheng Li; Bairong Shen
Journal:  Comput Struct Biotechnol J       Date:  2022-03-18       Impact factor: 7.271

5.  Non mycobacterial virulence genes in the genome of the emerging pathogen Mycobacterium abscessus.

Authors:  Fabienne Ripoll; Sophie Pasek; Chantal Schenowitz; Carole Dossat; Valérie Barbe; Martin Rottman; Edouard Macheras; Beate Heym; Jean-Louis Herrmann; Mamadou Daffé; Roland Brosch; Jean-Loup Risler; Jean-Louis Gaillard
Journal:  PLoS One       Date:  2009-06-19       Impact factor: 3.240

6.  Automatic identification of highly conserved family regions and relationships in genome wide datasets including remote protein sequences.

Authors:  Tunca Doğan; Bilge Karaçalı
Journal:  PLoS One       Date:  2013-09-12       Impact factor: 3.240

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.