Literature DB >> 12424125

Euclidian space and grouping of biological objects.

Vyacheslav N Grishin1, Nick V Grishin.   

Abstract

MOTIVATION: Biological objects tend to cluster into discrete groups. Objects within a group typically possess similar properties. It is important to have fast and efficient tools for grouping objects that result in biologically meaningful clusters. Protein sequences reflect biological diversity and offer an extraordinary variety of objects for polishing clustering strategies. Grouping of sequences should reflect their evolutionary history and their functional properties. Visualization of relationships between sequences is of no less importance. Tree-building methods are typically used for such visualization. An alternative concept to visualization is a multidimensional sequence space. In this space, proteins are defined as points and distances between the points reflect the relationships between the proteins. Such a space can also be a basis for model-based clustering strategies that typically produce results correlating better with biological properties of proteins.
RESULTS: We developed an approach to classification of biological objects that combines evolutionary measures of their similarity with a model-based clustering procedure. We apply the methodology to amino acid sequences. On the first step, given a multiple sequence alignment, we estimate evolutionary distances between proteins measured in expected numbers of amino acid substitutions per site. These distances are additive and are suitable for evolutionary tree reconstruction. On the second step, we find the best fit approximation of the evolutionary distances by Euclidian distances and thus represent each protein by a point in a multidimensional space. The Euclidian space may be projected in two or three dimensions and the projections can be used to visualize relationships between proteins. On the third step, we find a non-parametric estimate of the probability density of the points and cluster the points that belong to the same local maximum of this density in a group. The number of groups is controlled by a sigma-parameter that determines the shape of the density estimate and the number of maxima in it. The grouping procedure outperforms commonly used methods such as UPGMA and single linkage clustering.

Mesh:

Substances:

Year:  2002        PMID: 12424125     DOI: 10.1093/bioinformatics/18.11.1523

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  8 in total

1.  Exploring the nonlinear geometry of protein homology.

Authors:  Michael A Farnum; Huafeng Xu; Dimitris K Agrafiotis
Journal:  Protein Sci       Date:  2003-08       Impact factor: 6.725

2.  Double-stranded DNA bacteriophage prohead protease is homologous to herpesvirus protease.

Authors:  Hua Cheng; Nan Shen; Jimin Pei; Nick V Grishin
Journal:  Protein Sci       Date:  2004-08       Impact factor: 6.725

3.  The P5 protein from bacteriophage phi-6 is a distant homolog of lytic transglycosylases.

Authors:  Jimin Pei; Nick V Grishin
Journal:  Protein Sci       Date:  2005-03-31       Impact factor: 6.725

4.  Using protein design for homology detection and active site searches.

Authors:  Jimin Pei; Nikolay V Dokholyan; Eugene I Shakhnovich; Nick V Grishin
Journal:  Proc Natl Acad Sci U S A       Date:  2003-09-15       Impact factor: 11.205

5.  EDD, a novel phosphotransferase domain common to mannose transporter EIIA, dihydroxyacetone kinase, and DegV.

Authors:  Lisa N Kinch; Sara Cheek; Nick V Grishin
Journal:  Protein Sci       Date:  2005-01-04       Impact factor: 6.725

6.  Multidimensional scaling reveals the main evolutionary pathways of class A G-protein-coupled receptors.

Authors:  Julien Pelé; Hervé Abdi; Matthieu Moreau; David Thybert; Marie Chabbert
Journal:  PLoS One       Date:  2011-04-22       Impact factor: 3.240

7.  How Fitch-Margoliash Algorithm can Benefit from Multi Dimensional Scaling.

Authors:  Sylvain Lespinats; Delphine Grando; Eric Maréchal; Mohamed-Ali Hakimi; Olivier Tenaillon; Olivier Bastien
Journal:  Evol Bioinform Online       Date:  2011-06-07       Impact factor: 1.625

8.  Structural Bridges through Fold Space.

Authors:  Hannah Edwards; Charlotte M Deane
Journal:  PLoS Comput Biol       Date:  2015-09-15       Impact factor: 4.475

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.