Literature DB >> 10977100

Towards a complete map of the protein space based on a unified sequence and structure analysis of all known proteins.

G Yona1, M Levitt.   

Abstract

In search for global principles that may explain the organization of the space of all possible proteins, we study all known protein sequences and structures. In this paper we present a global map of the protein space based on our analysis. Our protein space contains all protein sequences in a non-redundant (NR) database, which includes all major sequence databases. Using the PSI-BLAST procedure we defined 4,670 clusters of related sequences in this space. Of these clusters, 1,421 are centered on a sequence of known structure. All 4,670 clusters were then compared using either a structure metric (when 3D structures are known) or a novel sequence profile metric. These scores were used to define a unified and consistent metric between all clusters. Two schemes were employed to organize these clusters in a meta-organization. The first uses a graph theory method and cluster the clusters in an hierarchical organization. This organization extends our ability to predict the structure and function of many proteins beyond what is possible with existing tools for sequence analysis. The second uses a variation on a multidimensional scaling technique to embed the clusters in a low dimensional real space. This last approach resulted in a projection of the protein space onto a 2D plane that provides us with a bird's eye view of the protein space. Based on this map we suggest a list of possible target sequences with unknown structure that are likely to adopt new, unknown folds.

Mesh:

Substances:

Year:  2000        PMID: 10977100

Source DB:  PubMed          Journal:  Proc Int Conf Intell Syst Mol Biol        ISSN: 1553-0833


  8 in total

1.  Exploring the nonlinear geometry of protein homology.

Authors:  Michael A Farnum; Huafeng Xu; Dimitris K Agrafiotis
Journal:  Protein Sci       Date:  2003-08       Impact factor: 6.725

2.  Alignment of protein sequences by their profiles.

Authors:  Marc A Marti-Renom; M S Madhusudhan; Andrej Sali
Journal:  Protein Sci       Date:  2004-04       Impact factor: 6.725

3.  Next-Generation Sequencing Techniques Reveal that Genomic Imprinting Is Absent in Day-Old Gallus gallus domesticus Brains.

Authors:  Qiong Wang; Kaiyang Li; Daixi Zhang; Junying Li; Guiyun Xu; Jiangxia Zheng; Ning Yang; Lujiang Qu
Journal:  PLoS One       Date:  2015-07-10       Impact factor: 3.240

Review 4.  Dichotomy in the definition of prescriptive information suggests both prescribed data and prescribed algorithms: biosemiotics applications in genomic systems.

Authors:  David J D'Onofrio; David L Abel; Donald E Johnson
Journal:  Theor Biol Med Model       Date:  2012-03-14       Impact factor: 2.432

5.  The distance-profile representation and its application to detection of distantly related protein families.

Authors:  Chin-Jen Ku; Golan Yona
Journal:  BMC Bioinformatics       Date:  2005-11-29       Impact factor: 3.169

6.  Structural Bridges through Fold Space.

Authors:  Hannah Edwards; Charlotte M Deane
Journal:  PLoS Comput Biol       Date:  2015-09-15       Impact factor: 4.475

7.  BIOZON: a system for unification, management and analysis of heterogeneous biological data.

Authors:  Aaron Birkland; Golan Yona
Journal:  BMC Bioinformatics       Date:  2006-02-15       Impact factor: 3.169

8.  Progress towards mapping the universe of protein folds.

Authors:  Alastair Grant; David Lee; Christine Orengo
Journal:  Genome Biol       Date:  2004-04-29       Impact factor: 13.583

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.