Literature DB >> 18689824

Connect the dots: exposing hidden protein family connections from the entire sequence tree.

Yaniv Loewenstein1, Michal Linial.   

Abstract

MOTIVATION: Mapping of remote evolutionary links is a classic computational problem of much interest. Relating protein families allows for functional and structural inference on uncharacterized families. Since sequences have diverged beyond reliable alignment, these are too remote to identify by conventional methods. APPROACH: We present a method to systematically identify remote evolutionary relations between protein families, leveraging a novel evolutionary-driven tree of all protein sequences and families. A global approach which considers the entire volume of similarities while clustering sequences, leads to a robust tree that allows tracing of very faint evolutionary links. The method systematically scans the tree for clusters which partition exceptionally well into extant protein families, thus suggesting an evolutionary breakpoint in a putative ancient superfamily. Our method does not require family pro.les (or HMMs), or multiple alignment.
RESULTS: Considering the entire Pfam database, we are able to suggest 710 links between protein families, 125 of which are con.rmed by existence of Pfam clans. The quality of our predictions is also validated by structural assignments. We further provide an intrinsic characterization of the validity of our results and provide examples for new biological.ndings, from our systematic scan. For example, we are able to relate several bacterial pore-forming toxin families, and then link them with a novel family of eukaryotic toxins expressed in plants.sh venom and notably also uncharacterized proteins from human pathogens. AVAILABILITY: A detailed list of putative homologous superfamilies, including 210 families of unknown function, has been made available online: http://www.protonet.cs.huji.ac.il/dots

Entities:  

Mesh:

Substances:

Year:  2008        PMID: 18689824     DOI: 10.1093/bioinformatics/btn301

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  4 in total

1.  Prediction of mucin-type O-glycosylation sites by a two-staged strategy.

Authors:  YuDong Cai; JianFeng He; Lin Lu
Journal:  Mol Divers       Date:  2010-07-22       Impact factor: 2.943

2.  ESG: extended similarity group method for automated protein function prediction.

Authors:  Meghana Chitale; Troy Hawkins; Changsoon Park; Daisuke Kihara
Journal:  Bioinformatics       Date:  2009-05-12       Impact factor: 6.937

3.  Variant surface antigens of malaria parasites: functional and evolutionary insights from comparative gene family classification and analysis.

Authors:  Christian Frech; Nansheng Chen
Journal:  BMC Genomics       Date:  2013-06-27       Impact factor: 3.969

4.  SUPERFAMILY--sophisticated comparative genomics, data mining, visualization and phylogeny.

Authors:  Derek Wilson; Ralph Pethica; Yiduo Zhou; Charles Talbot; Christine Vogel; Martin Madera; Cyrus Chothia; Julian Gough
Journal:  Nucleic Acids Res       Date:  2008-11-26       Impact factor: 16.971

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.