Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 High-quality sequence clustering guided by network topology and multiple alignment likelihood.

Literature DB >> 22368255

High-quality sequence clustering guided by network topology and multiple alignment likelihood.

Vincent Miele¹, Simon Penel, Vincent Daubin, Franck Picard, Daniel Kahn, Laurent Duret.

Abstract

MOTIVATION: Proteins can be naturally classified into families of homologous sequences that derive from a common ancestor. The comparison of homologous sequences and the analysis of their phylogenetic relationships provide useful information regarding the function and evolution of genes. One important difficulty of clustering methods is to distinguish highly divergent homologous sequences from sequences that only share partial homology due to evolution by protein domain rearrangements. Existing clustering methods require parameters that have to be set a priori. Given the variability in the evolution pattern among proteins, these parameters cannot be optimal for all gene families.
RESULTS: We propose a strategy that aims at clustering sequences homologous over their entire length, and that takes into account the pattern of substitution specific to each gene family. Sequences are first all compared with each other and clustered into pre-families, based on pairwise similarity criteria, with permissive parameters to optimize sensitivity. Pre-families are then divided into homogeneous clusters, based on the topology of the similarity network. Finally, clusters are progressively merged into families, for which we compute multiple alignments, and we use a model selection technique to find the optimal tradeoff between the number of families and multiple alignment likelihood. To evaluate this method, called HiFiX, we analyzed simulated sequences and manually curated datasets. These tests showed that HiFiX is the only method robust to both sequence divergence and domain rearrangements. HiFiX is fast enough to be used on very large datasets.
AVAILABILITY AND IMPLEMENTATION: The Python software HiFiX is freely available at http://lbbe.univ-lyon1.fr/hifix.

Mesh：

Substances：
Proteins

Year: 2012 PMID： 22368255 DOI： 10.1093/bioinformatics/bts098

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

Keyword Cloud
Cited

13 in total

1. De novo clustering of long reads by gene from transcriptomics data.

Authors: Camille Marchet; Lolita Lecompte; Corinne Da Silva; Corinne Cruaud; Jean-Marc Aury; Jacques Nicolas; Pierre Peterlongo
Journal: Nucleic Acids Res Date: 2019-01-10 Impact factor: 16.971

2. Genome-wide analysis of the Firmicutes illuminates the diderm/monoderm transition.

Authors: Najwa Taib; Daniela Megrian; Jerzy Witwinowski; Panagiotis Adam; Daniel Poppleton; Guillaume Borrel; Christophe Beloin; Simonetta Gribaldo
Journal: Nat Ecol Evol Date: 2020-10-19 Impact factor: 15.460

3. Bioinformatic and mutational studies of related toxin-antitoxin pairs in Mycobacterium tuberculosis predict and identify key functional residues.

Authors: Himani Tandon; Arun Sharma; Saruchi Wadhwa; Raghavan Varadarajan; Ramandeep Singh; Narayanaswamy Srinivasan; Sankaran Sandhya
Journal: J Biol Chem Date: 2019-04-24 Impact factor: 5.157

4. Evaluation and improvements of clustering algorithms for detecting remote homologous protein families.

Authors: Juliana S Bernardes; Fabio R J Vieira; Lygia M M Costa; Gerson Zaverucha
Journal: BMC Bioinformatics Date: 2015-02-05 Impact factor: 3.169

5. Comparison of topological clustering within protein networks using edge metrics that evaluate full sequence, full structure, and active site microenvironment similarity.

Authors: Janelle B Leuthaeuser; Stacy T Knutson; Kiran Kumar; Patricia C Babbitt; Jacquelyn S Fetrow
Journal: Protein Sci Date: 2015-08-18 Impact factor: 6.725

6. Quantitative synteny scoring improves homology inference and partitioning of gene families.

Authors: Raja Hashim Ali; Sayyed Muhammad; Mehmood Khan; Lars Arvestad
Journal: BMC Bioinformatics Date: 2013-10-15 Impact factor: 3.169

7. Phylogenomic test of the hypotheses for the evolutionary origin of eukaryotes.

Authors: Nicolas C Rochette; Céline Brochier-Armanet; Manolo Gouy
Journal: Mol Biol Evol Date: 2014-01-07 Impact factor: 16.240

8. Ancestral Genome Estimation Reveals the History of Ecological Diversification in Agrobacterium.

Authors: Florent Lassalle; Rémi Planel; Simon Penel; David Chapulliot; Valérie Barbe; Audrey Dubost; Alexandra Calteau; David Vallenet; Damien Mornico; Thomas Bigot; Laurent Guéguen; Ludovic Vial; Daniel Muller; Vincent Daubin; Xavier Nesme
Journal: Genome Biol Evol Date: 2017-12-01 Impact factor: 3.416

Review 9. A pluralistic account of homology: adapting the models to the data.

Authors: Leanne S Haggerty; Pierre-Alain Jachiet; William P Hanage; David A Fitzpatrick; Philippe Lopez; Mary J O'Connell; Davide Pisani; Mark Wilkinson; Eric Bapteste; James O McInerney
Journal: Mol Biol Evol Date: 2013-11-22 Impact factor: 16.240

10. GenFamClust: an accurate, synteny-aware and reliable homology inference algorithm.

Authors: Raja H Ali; Sayyed A Muhammad; Lars Arvestad
Journal: BMC Evol Biol Date: 2016-06-04 Impact factor: 3.260