Literature DB >> 12696047

Monte Carlo estimation of the number of possible protein folds: effects of sampling bias and folds distributions.

Hadas Leonov1, Joseph S B Mitchell, Isaiah T Arkin.   

Abstract

The estimation of the number of protein folds in nature is a matter of considerable interest. In this study, a Monte Carlo method employing the broken stick model is used to assign a given number of proteins into a given number of folds. Subsequently, random, integer, non-repeating numbers are generated in order to simulate the process of fold discovery. With this conceptual framework at hand, the effects of two factors upon the fold identification process were investigated: (1) the nature of folds distributions and (2) preferential sampling bias of previously identified folds. Depending on the type of distribution, dividing 100,000 proteins into 1,000 folds resulted in 10-30% of the folds having 10 proteins or less per fold, approximately 10% of the folds having 10-20 proteins per fold, 31-45% having 20-100 proteins per fold, and >30% of the folds having more than 100 proteins per fold. After randomly sampling one tenth of the proteins, 68-96% of the folds were identified. These percentages depend both on folds distribution and biased/non-biased sampling. Only upon increasing the sampling bias for previously identified folds to 1,000, did the model result in a reduction of the number of proteins identified by an order of magnitude (approximately 9%). Thus, assuming the structures of one tenth of the population of proteins in nature have been solved, the results of the Monte Carlo simulation are more consistent with recent lower estimates of the number of folds, <or=1,000. Any deviation from this estimate would reflect significant bias in the experimental sampling of protein structure, and/or substantially nonuniform folds distribution, manifested in a large number of single-fold proteins. Copyright 2003 Wiley-Liss, Inc.

Mesh:

Substances:

Year:  2003        PMID: 12696047     DOI: 10.1002/prot.10336

Source DB:  PubMed          Journal:  Proteins        ISSN: 0887-3585


  3 in total

1.  EPO-KB: a searchable knowledge base of biomarker to protein links.

Authors:  Jonathan L Lustgarten; Chad Kimmel; Henrik Ryberg; William Hogan
Journal:  Bioinformatics       Date:  2008-04-09       Impact factor: 6.937

2.  Novel protein folds and their nonsequential structural analogs.

Authors:  Aysam Guerler; Ernst-Walter Knapp
Journal:  Protein Sci       Date:  2008-06-26       Impact factor: 6.725

3.  Progress towards mapping the universe of protein folds.

Authors:  Alastair Grant; David Lee; Christine Orengo
Journal:  Genome Biol       Date:  2004-04-29       Impact factor: 13.583

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.